Theory and Practice of Storing and Managing Big Data

Handling massive-scale data is becoming enormously challenging not only because of the scale but also because of the variety. This course is about storing and managing a huge volume of data that are different types such as structured relational data, and unstructured text, video, and audio. The course covers fundamental concepts of data storage. The course covers theoretical aspects of NoSQL and file based data storage. It covers the detail of storing data in NoSQL engines such as MongoDB, Apache HBase, and Apache Cassandra etc. It also covers storing in file based storage such as Hadoop Distributed File System, Lustre File System, and Ceph File System. In addition, this course covers various technologies including Apache Hive, Cloudera Impala, Apache Drill etc. for storing and querying relational data (SQL) on scalable and distributed storage Hadoop. It also covers the detail of graph database technology Neo4j specifically storing and querying data with this graph database. During the course a real-world data storing and management scenario will be simulated.

Course Objectives

  • Provide a strong conceptual knowledge of NoSQL databases and file based storages.
  • Provide theoretical foundation of NoSQL and file based data storage.
  • Explain the methods of storing data in NoSQL engines or in files.
  • Explain the components and architectures of NoSQL technologies including Apache Cassandra, Redis, MongoDB, and Neo4j.
  • Explain the components and architectures of file based storage including HDFS, Lustre file system, and Ceph.
  • Provide hands-on of knowledge how to store data in NoSQL databases
  • Provide hands-on knowledge of how to to store data in file systems
  • Present and describe the simulation of a real-world data storage scenario

What is in it for the Participants?

  • Learning fundamental concepts of NoSQL and file based storage.
  • Learning theoretical aspects of NoSQL and file system.
  • Learning methods to store massive-scale data.
  • Being able to store data in scalable column oriented database engines including Apache Cassandra.
  • Being able to store data in scalable document oriented database engines including MongoDB.
  • Being able to store data in scalable graph database engines include Neo4j
  • Being able to store data in scalable Key-Value store Redis.
  • Being able to store data in Apache Hive.
  • Begin able to design and execute analytical queries in NoSQL engines.
  • Being able to contribute in designing and developing file based storage using HDFS file systems.