Scaling Machine Learning Pipelines with Apache Spark

May 25, 2021 09:00 AM (PT)

Role: ML Engineer, Data Scientist

Duration: Half-day

Labs: Yes


Integrate machine learning solutions with scalable production pipelines backed by Apache Spark through:

  • Investigating common inefficiencies in machine learning
  • Scaling development and tuning of machine learning models using Spark MLlib and Hyperopt
  • Parallelize model training & inference with Pandas UDFs and the Pandas Function APIs


  • Beginning experience with the PySpark DataFrame API
  • Intermediate experience with Python