Course Description
This course familiarizes participants with different aspects of large data sets and how they are managed both on site and in the Cloud. Emphasis is placed on providing participants with hands-on experience from data ingestion to analysis of large data sets, both data-at-rest or data-in-motion (streaming data), including defining Big Data and its 5 V's: Volume, Velocity, Variety, Veracity, and Value. Architectures of distributed databases and storage, ecosystems such as Hadoop and Spark are covered followed by introduction to Scala, Spark-Shell and PySpark.
This course is part of the Professional Development Certificate in Data Science and Machine Learning.
Prerequisites
Corequisites
Schedule
This Course was not Offered During Winter 2025 Term |
This Course was not Offered During Spring/Summer 2025 Term |
The tentative timetable is not yet available for the Fall 2025 Term |
The tentative timetable is not yet available for the Winter 2026 Term |