Theory and Practice of Real-time Data Processing

Real-time data is a strong need for many organizations today, as it helps to analyze and realize for instance, customer sentiment, instantly. However, real-time applications are highly challenging to develop, deploy and control on distributed infrastructure. This course covers a wide range of topics related to real-time data processing technologies. It covers the theoretical aspects of real-time data processing. The course covers several implementation tasks of real-time application ranging from configuring a real-time data processing cluster to write application codes for running processing tasks. Also, during the course a real-world end-to-end data processing scenario will be simulated.

Course Objectives

  • Provide a strong conceptual knowledge of real-time data processing technologies.
  • Provide a solid understanding of theoretical foundation of stream processing technologies.
  • Provide practical knowledge of setting up single-node and multi-node storm clusters.
  • Provide practical knowledge of maintaining, managing real-time data processing cluster.
  • Provide guidelines of how to tune the runtime performance of realtime data processing technology Apache Storm.
  • Present and describe a simulation of the real-world data processing scenario.

What is in it for the Participants?

  • Learning the basic concepts of data processing technologies including architectural styles, data management, and data algorithms.
  • Learning the core architecture and components of data processing technologies including Apache Storm, Apache Spark, Apache Flink, Apache Tez, Apache Hama, and Apache MapReduce.
  • Learning theoretical foundation of realtime data processing technologies.
  • Being able to configure a single-node and multi-node Apache Storm cluster.
  • Being able deploy and manage processing jobs real-time in single-node or in a distributed cluster.
  • Being able to administer and maintain Apache Storm cluster.
  • Learning challenging issues of Apache Storm cluster.