Building, Debugging, and Tuning Spark Machine Learning Pipelines

Download Slides

Machine Learning workflows involve complex sequences of data transformations, learning algorithms, and parameter tuning. Spark ML Pipelines, introduced in Spark 1.2, have grown into a powerful framework for developing ML workflows. This talk will cover basic Pipeline concepts and then demonstrate their usage:
(1) Building: Pipelines simplify the process of specifying a ML workflow.
(2) Debugging: Pipelines and DataFrames permit users to inspect and debug the workflow.
(3) Tuning: Built-in support for parameter tuning helps users optimize ML performance.



« back
About Joseph Bradley

Joseph Bradley works as a Solutions Architect at Databricks, specializing in Machine Learning, and is an Apache Spark committer and PMC member. Previously, he was a Staff Software Engineer at Databricks and a postdoc at UC Berkeley, after receiving his Ph.D. in Machine Learning from Carnegie Mellon.