Theory and Practice of Batch Style Analytics with Big Data

Batch style analytics is performed with historical data. It is used in developing models to perform a wide variety of tasks such as recommending books or videos, or anomaly detection in network traffic. Additionally, the batch style analysis methods used in developing predictive model for extracting knowledge and forecasting outcomes such as predicting sales of an organization. This course covers theoretical foundation of batch style analytics technologies. This course covers designing and developing batch style analytics model using machine learning and basic methods of statistics. Also, it covers the detail of technologies such as the machine learning libraries which include Apache Mahout, and MLib. The course includes the detail of how to deploy and run analytic model on Spark and Hadoop cluster.

Career Objectives

  • Provide fundamental concepts of batch style data.
  • Provide theoretical background of batch style analytics technologies.
  • Provide hands-on knowledge of how to configure machine learning libraries specifically, integrating Apache Mahout with Hadoop and MLib with Apache Spark.
  • Provide hands-on knowledge of how to configure and deploy single-node and multi-node MLib-Spark cluster and Mahout-Hadoop cluster.
  • Provide hands-on knowledge of how to build and implement analytics model using libraries, deploy and run them in the single-node and distributed clusters.
  • Present and describe the simulation of a real-world scenario of analyzing Big Data using batch method.

What is in it for the Participants?

  • Learning fundamentals of batch style data analytics technologies include Apache Mahout and MLib.
  • Learning the theory of batch style data analytics technologies.
  • Being able to configure a batch analytics system by combining Apache Spark and MLib.
  • Being able to configure a batch analytic system by combining Apache Mahout and Hadoop.
  • Being able to implement an analytics model using MLib and deploying it on single-node and multi-node Apache Spark cluster.
  • Being able to implement an analytics model using Apache Mahout and deploying it on single-node and multi-node Apache Hadoop cluster.