Large Scale Data Engineering

Please go to http://event.cwi.nl/lsde for more information.

If you are an enrolled student, please enroll in a practicum group,

datacenter.jpg

COURSE OBJECTIVE

The goal of the course is to gain insight into and experience with
algorithms and infrastructures for managing big data.

COURSE CONTENT

This course confronts the students with some data management tasks,
where the challenge is that the mere size of this data causes naive
solutions, and/or solutions that work only on a single machine, to stop
being practical. Solving such tasks requires the computer scientist to
have insight in the main factors that underlie algorithm performance
(access pattern, hardware latency/bandwidth), as well as possess certain
skills and experience in managing large-scale computing infrastructure.

FORM OF TUITION

There are two lectures per week, and requires significant practical
work. The practicals are done outside lecture hours, at the discretion
of the students who are supported remotely through Skype screen sharing.

TYPE OF ASSESSMENT

In the first assignment the students can work either on their own
laptops via a prepared VM, or in the cloud using an Amazon EC2 Micro
Instance; and there is an online competition between practicum teams for
the best result. The second assignment, using a Hadoop Cluster, are done
on the SurfSARA Hadoop cluster (90 machines, 720 cores, 1.2PB storage).
For this assignment, a report of 5-8 pages must be written. The students
also need to read two scientific papers of choice, related to the second
assignment, and present these in class. There is no written examn; the
grade is based on the two assignments grades, the grade for the in-class
presentation and attendance/participation.

COURSE READING

scientific papers provided in the course

ENTRY REQUIREMENTS

Hadoop environments are consist of Linux machines, so some basic ability
in working with these comes in handy. Also, you must have some
programming skills in C,C++ or Java.

RECOMMENDED BACKGROUND KNOWLEDGE

Programming proficiency in C/C++ or Java

TARGET AUDIENCE

mCS, mPDCS

Course summary:

Date Details Due