Data Processing Concepts and Techniques

Quality data is a paramount importance to perform an efficient analysis and knowledge extraction. Therefore, raw data that are collected from different sources need to be processed. Data preprocessing is a collection of tasks which include scrubbing, wrangling, enriching, and partitioning data. However, data processing in real-time or processing large-scale data is heavily challenging. This course covers data processing concepts and methodologies. The course covers methods that are used to implement solution for assessing data quality. It covers various tools for cleaning, transforming, enriching, and partitioning data. Also, during the course a real-world data processing scenario will be simulated.

Course Objectives

  • Provide conceptual knowledge of processing methods.
  • Explain various algorithms for processing various types of data including structure, unstructured, and semi-structured data.
  • Explain techniques for filtering, and partitioning massive-scale data.
  • Provide strong knowledge of efficient distribution of data in a cluster.
  • Provide a strong understanding of how processing task can be performed to ensure data quality.

What is in it for the Participants?

  • Learning how to handle the variety and volume of data while preprocessing.
  • Being able to develop a solution for processing different types of data including structured data (SQL tables), unstructured data including texts, images, geographical data, audio, video, and so on.
  • Learning how to asses the quality of data and explore best practices for data cleaning and transformation.
  • Being able to identify unnecessary/insignificant data by using filtering techniques and prepare datasets that are relevant to the analysis.
  • Learning how to enrich data by integrating various data that are relevant to the subjects of analysis.
  • Being able to design and implement data processing solution which ensures the quality of data.