Big Data Processing in Real-time

Unlike classic batch processing, real-time data processing helps in exploring data, and extracting knowledge instantly on-the-fly. Learning about products and services from Twitter, Facebook, Blogs, newsfeeds, or any other data streams in real-time may give a huge competitive advantages to the enterprises. Therefore, real-time  solutions (such as analytics) are  certainly potential to be of great current and future value to the industries of all types and all scales. Real-time data processing solutions are now a dire need for many organizations today, as it helps to analyze and extract knowledge in seconds or minutes. However, real-time applications are heavily challenging to develop, deploy and control on distributed infrastructure. This course covers the fundamental concepts of a wide number of technologies. The course covers the methodologies of implementing a real-time solution specifically the development steps that include building real-time data processing cluster, write application codes, and deploy solution for running processing tasks. Also, during the course a real-world end-to-end data processing scenario will be simulated.

Course Objectives

  • Provide strong conceptual knowledge of real-time data processing technologies.
  • Provide strong knowledge of fault-tolerance in a distributed architecture.
  • Provide practical knowledge of setting up single-node and multi-node Apache Storm.
  • Provide practical knowledge of developing real-time data processing solution.
  • Provide practical knowledge of maintaining, managing data processing clusters.
  • Provide guidelines how to tune the runtime performance of data processing technologies specifically Apache Storm.
  • Present and describe a simulation of a real-world data processing scenario.

What is in it for the Participants?

  • Learning basic concepts of data processing including methods, architectural styles, and data algorithms .
  • Learning the core architecture and components of data processing technologies including Apache Storm, Apache Spark, Apache Flink, Apache Tez, Apache Hama, and Apache MapReduce.
  • Being able to configure a single-node and multi-node Apache Storm cluster.
  • Being able to implement a real-time data processing solution.
  • Being able to deploy and manage data processing jobs in real-time in single-node or in a distributed cluster
  • Being able to administer and maintain Apache Storm cluster.
  • Learning about challenging issues of Apache Storm cluster.