4.BDT: Distributed Big Data Processing

Code
L.26848
Amount of hours required
140
Quartile of execution
1

At the moment, data is generated and collected in huge quantities. This module explains how to process "Big Data" (be it varied, structured, unstructured, generated from multiple sources, at different speeds, etc.) by setting up a pipeline of components for the data, so it can be processed efficiently. A range of components and architectures, such as batch processing and streaming architectures, are covered. The focus is on understanding distributed processing of data.

Competences

HBO ICT 14.3 SW/REA/3

Learning goals

The student understands the issues linked to batch distributed data processing using MapReduce, and can describe and code a solution.
The student understands the issues linked to batch distributed data processing in-memory with Spark, RDDs and Data Frames, and can describe and code a solution.
The student understands the issues linked to distributed data stream processing, and can describe and code a solution using Kafka and Spark structured streaming.

Tests

Code	Name
WC	Werkcollege
T.52268	4.BDT: Distributed Big Data Processing