Scalable Machine Learning with Apache Spark
Scalable Machine Learning with Apache Spark™
Description
This course teaches you how to scale ML pipelines with Spark, including distributed training, hyperparameter tuning, and inference. You will build and tune ML models with SparkML while leveraging MLflow to track, version, and manage these models. This course covers the latest ML features in Apache Spark, such as Pandas UDFs, Pandas Functions, and the pandas API on Spark, as well as the latest ML product offerings, such as Feature Store and AutoML.
Duration
2 full days or 4 half days
Objectives
- Perform scalable EDA with Spark
- Build and tune machine learning models with SparkML
- Track, version, and deploy models with MLflow
- Perform distributed hyperparameter tuning with HyperOpt
- Use the Databricks Machine Learning workspace to create a Feature Store and AutoML experiments
- Leverage the pandas API on Spark to scale your pandas code
Prerequisites
- Intermediate experience with Python
- Experience building machine learning models
- Familiarity with PySpark DataFrame API
Logistics
- Zoom is our chosen online platform to deliver classes. Ensure you can access Zoom by clicking here
- Please have one of these supported browsers installed
Outline
Day 1
- Spark / ML overview
- Exploratory data analysis (EDA) and feature engineering with Spark
- Linear regression with SparkML: transformers, estimators, pipelines, and evaluators
- MLflow Tracking and Model Registry
Day 2
- Tree-based models: Hyperparameter tuning and parallelism
- HyperOpt for distributed hyperparameter tuning
- Databricks AutoML and Feature Store
- Integrating 3rd party packages (distributed XGBoost)
- Distributed inference of scikit-learn models with pandas UDFs
- Distributed training with pandas function API
- Pandas API on Spark for data manipulation
Upcoming Classes
Want to Book a Class?
If your organization would like to request a private delivery of the course, please email our training operations team at training-ops@databricks.com with your request.
If a public class is full and you cannot find another class that suits your schedule, please send an email to our training operations team to request a class.
We will do our best to accommodate your needs.
If you have any questions, please refer to our Frequently Asked Questions page.