4.BDT: Distributed Big Data Processing

  • Code

    L.26848

  • Amount of hours required

    140

  • Quartile of execution

    1

At the moment, data is generated and collected in huge quantities. This module explains how to process "Big Data" (be it varied, structured, unstructured, generated from multiple sources, at different speeds, etc.) by setting up a pipeline of components for the data, so it can be processed efficiently. A range of components and architectures, such as batch processing and streaming architectures, are covered. The focus is on understanding distributed processing of data. 

Competences

  • HBO ICT 14.3 SW/REA/3

Learning goals

The student understands the issues linked to batch distributed data processing using MapReduce, and can describe and code a solution. 
The student understands the issues linked to batch distributed data processing in-memory with Spark, RDDs and Data Frames, and can describe and code a solution. 
The student understands the issues linked to distributed data stream processing, and can describe and code a solution using Kafka and Spark structured streaming. 

Tests