Big Data Computing (full time)
Timing: Full time: Monday, Part-time: Sunday
Scalable data-analytics workloads introduce new challenges: How can you scale out your data analytics systems if your data size and computational need can’t be satisfied with a single computer?
In this course we will cover the basics of distributed Data Analytics systems, and take a look at the architecture of the de-facto distributed tool: Apache Hadoop. We are covering data storage (HDFS) and resource management (YARN). In the second half of the course, you will use how you can use a popular modern distributed compute engine, Apache Spark, to execute traditional, real-time and machine learning workloads in a scalable fashion.