Scaling Machine Learning Pipelines with Apache Spark

May 25, 2021 01:00 PM (PT)

Integrate machine learning solutions with scalable production pipelines backed by Apache Spark through:

  • Investigating common inefficiencies in machine learning
  • Scaling development and tuning of machine learning models using Spark MLlib and Hyperopt
  • Parallelize model training & inference with Pandas UDFs and the Pandas Function APIs


  • Beginning experience with the PySpark DataFrame API
  • Intermediate experience with Python


Role: ML Engineer, Data Scientist

Duration: Half-day

Labs: Yes