Storing and Managing Massive-Scale Data

More than seven Terabytes (TBs) data are generated everyday by Twitter; Facebook generates 4.5 billion likes everyday; and 300 million photos are uploaded per day in Facebook. The world would have to face more than 44 zettabytes data by 2020. Handling data is becoming enormously challenging not only because of the scale but also because of the variety. In short, how to store large volume diverse data is a major challenge today for many organizations. This course is about storing and managing huge volume of data that are structured relational data, unstructured text, video, and audio. This course covers fundamental concepts of data storage specifically the concepts of NoSQL databases. It covers the detail of storing and querying with NoSQL database engines such as MongoDB, Apache HBase, and Apache Cassandra etc. It also covers storing and querying with file based storage such as Hadoop Distributed File System, Lustre File System, and Ceph File System. In addition, this course covers various technologies including Apache Hive, Cloudera Impala, Apache Drill etc. for storing and querying relational data (SQL) on Hadoop. It also covers the detail of the graph database technology called Neo4j. During the course a real-world data storing and management scenario will be simulated.

Course Objectives

  • Provide strong conceptual knowledge of NoSQL databases and file based storages.
  • Explain various methods of storing data in NoSQL engines or in files.
  • Explain the components and architectures of NoSQL technologies including Apache Cassandra, Redis, MongoDB, and Neo4j.
  • Explain components and architectures of file based storage including HDFS, Lustre file system, and Ceph.
  • Provide hands-on knowledge of how to store data stored in NoSQL databases.
  • Provide hands-on knowledge of how to to store data in file systems.
  • Present and describe the simulation of a real-world data storage scenario.

What is in it for the Participants?

  • Learning fundamental concepts of NoSQL and file based storage.
  • Learning the methods to store and query data.
  • Being able to design and deploy schema to store data in scalable column oriented database engines Apache Cassandra, and Apache HBase.
  • Being able to design and deploy schema to store data in scalable document oriented database engine MongoDB.
  • Being able to design, implement, and deploy schema to store data in scalable graph database engine Neo4j.
  • Being able to design, and implement schema to store data in scalable Key-Value store Redis.
  • Being able to design, and implement schema to store data in Apache Hive.
  • Being able to design and implement file based storage using HDFS file systems.