Batch Style Analytics with Big Data

The batch style analytics (also known as offline analytics) is performed with historical data. It is used in developing models to carry out a wide variety of tasks such as recommending books, videos, or anomaly detection in network traffic. Additionally, batch style techniques used in developing predictive models which can be used in extracting knowledge to forecast outcomes such as, sales of an organization. This course is about designing and developing batch style analytics model using machine learning, or the basic methods of statistics. Also, the course covers the detail of technologies including machine learning libraries such as, Apache Mahout, and MLib. In addition, the course covers how to deploy and run analytic model on Spark and Hadoop cluster. During the course a real-world batch analytics scenario will be simulated.

Course Objectives

  • Provide a solid background of batch style analytics libraries including MLib and Apache Mahout.
  • Provide hands knowledge of how to configure machine learning libraries specifically, integrating Apache Mahout with Hadoop and MLib with Apache Spark
  • Provide hands-on knowledge of how to build analytics model using the libraries, deploy and run them in single-node and distributed clusters.
  • Provide hands-on knowledge about implementing and deploying jobs on single-node and multi-node  MLib-Spark cluster and Mahout-Hadoop cluster
  • Present and describe the simulation of a real-world scenario of analyzing Big Data using batch method.

What is in it for the Participants?

  • Learning fundamentals of batch style data analytics technologies include Apache Mahout and MLib.
  • Being able to configure a batch analytics system by combining Apache Spark and MLib.
  • Being able to configure a batch analytic system by integrating Apache Mahout and Hadoop.
  • Being able to implement an analytics model using MLib and deploying it on single-node and multi-node Apache Spark cluster.
  • Being able to implement an analytics model using Apache Mahout and deploying it on single-node and multi-node Apache Hadoop cluster.
  • Being able to administer and manage Apache Spark cluster.