Big Data Processing in Real-time
Unlike classic batch processing, real-time data processing helps in exploring data, and extracting knowledge instantly on-the-fly. Learning about products and services from Twitter, Facebook, Blogs, newsfeeds, or any other data streams in real-time may give a huge competitive advantages to the enterprises. Therefore, real-time solutions (such as analytics) are certainly potential to be of great current and future value to the industries of all types and all scales. Real-time data processing solutions are now a dire need for many organizations today, as it helps to analyze and extract knowledge in seconds or minutes. However, real-time applications are heavily challenging to develop, deploy and control on distributed infrastructure. This course covers the fundamental concepts of a wide number of technologies. The course covers the methodologies of implementing a real-time solution specifically the development steps that include building real-time data processing cluster, write application codes, and deploy solution for running processing tasks. Also, during the course a real-world end-to-end data processing scenario will be simulated.
- Provide strong conceptual knowledge of real-time data processing technologies.
- Provide strong knowledge of fault-tolerance in a distributed architecture.
- Provide practical knowledge of setting up single-node and multi-node Apache Storm.
- Provide practical knowledge of developing real-time data processing solution.
- Provide practical knowledge of maintaining, managing data processing clusters.
- Provide guidelines how to tune the runtime performance of data processing technologies specifically Apache Storm.
- Present and describe a simulation of a real-world data processing scenario.
What is in it for the Participants?
- Learning basic concepts of data processing including methods, architectural styles, and data algorithms .
- Learning the core architecture and components of data processing technologies including Apache Storm, Apache Spark, Apache Flink, Apache Tez, Apache Hama, and Apache MapReduce.
- Being able to configure a single-node and multi-node Apache Storm cluster.
- Being able to implement a real-time data processing solution.
- Being able to deploy and manage data processing jobs in real-time in single-node or in a distributed cluster
- Being able to administer and maintain Apache Storm cluster.
- Learning about challenging issues of Apache Storm cluster.