Skip to main content

In the age of ‘Big Data,’ with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, data science techniques are fast becoming core components of large-scale data processing pipelines.

Apache Spark offers analysts and engineers a powerful tool for building these pipelines, and learning to build such pipelines will soon be a lot easier. Databricks is excited to be working with professors from University of California Berkeley and University of California Los Angeles to produce two new upcoming Massive Open Online Courses (MOOCs). Both courses will be freely available on the edX MOOC platform in spring summer 2015. edX Verified Certificates are also available for a fee.

Introduction to Big Data with Apache Spark

The first course, called Introduction to Big Data with Apache Spark, will teach students about Apache Spark and performing data analysis. Students will learn how to apply data science techniques using parallel programming in Spark to explore big (and small) data. The course will include hands-on programming exercises including Log Mining, Textual Entity Recognition, Collaborative Filtering that teach students how to manipulate data sets using parallel processing with PySpark (part of Apache Spark). The course is also designed to help prepare students for taking the Spark Certified Developer exam. The course is being taught by Anthony Joseph, a professor at UC Berkeley and technical advisor at Databricks, and will start on February 23rd June 1st, 2015.

The second course, called Scalable Machine Learning, introduces the underlying statistical and algorithmic principles required to develop scalable machine learning pipelines, and provides hands-on experience using PySpark. It presents an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. Students will use Spark to implement scalable algorithms for fundamental statistical models while tackling real-world problems from various domains. The course is being taught by Ameet Talwalkar, an assistant professor at UCLA and technical advisor at Databricks, and will start on April 14th June 29th, 2015.

Both courses are available for free on the edX website. https://www.edx.org/

Try Databricks for free

Related posts

Apache Spark 2015 Year In Review

To learn more about Apache Spark, attend Spark Summit East in New York in Feb 2016 . 2015 has been a year of...

Apache Spark Earns Datanami Awards for Machine Learning, Real-time Analytics, and More

September 19, 2016 by Jules Damji in
Today, the Datanami Readers’ and Editors’ Choice Awards recognized the sweeping changes Apache Spark is bringing to the Big Data landscape with four...

Using MongoDB with Apache Spark

March 20, 2015 by Matt Kalan in
Update August 4th 2016: Since this original post, MongoDB has released a new Databricks-certified connector for Apache Spark. See the updated blog post...
See all Company Blog posts