Databricks Certification and Badging
The new standard for lakehouse training and certifications
Databricks Certified Machine Learning Associate
The Databricks Certified Machine Learning Associate certification exam assesses an individual’s ability to use Databricks to perform basic machine learning tasks. This includes an ability to understand and use Databricks Machine Learning and its capabilities like AutoML, Feature Store, and select capabilities of MLflow. It also assesses the ability to make correct decisions in machine learning workflows and implement those workflows using Spark ML. Finally, an ability to understand advanced characteristics of scaling machine learning models is assessed. Individuals who pass this certification exam can be expected to complete basic machine learning tasks using Databricks and its associated tools.
In order to achieve this certification, earners must pass a certification exam. In order to achieve this certification, please either log in or create an account in our certification platform.
This certification is part of the Machine Learning learning pathway.
This certification exam is set to be released in the near future. Details about the certification exam are provided below. Please note that these details are subject-to-change.
Minimally Qualified Candidate
The minimally qualified candidate should be able to:
- Use Databricks Machine Learning and its capabilities within machine learning workflows, including:
- Databricks Machine Learning (clusters, Repos, Jobs)
- Databricks Runtime for Machine Learning (basics, libraries)
- AutoML (classification, regression, forecasting)
- Feature Store (basics)
- MLflow (Tracking, Models, Model Registry)
- Implement correct decisions in machine learning workflows, including:
- Exploratory data analysis (summary statistics, outlier removal)
- Feature engineering (missing value imputation, one-hot-encoding)
- Tuning (hyperparameter basics, hyperparameter parallelization)
- Evaluation and selection (cross-validation, evaluation metrics)
- Implement machine learning solutions at scale using Spark ML and other tools, including:
- Distributed ML Concepts
- Spark ML Modeling APIs (data splitting, training, evaluation, estimators vs. transformers, pipelines)
- Pandas API on Spark
- Pandas UDFs and Pandas Function APIs
- Understand advanced scaling characteristics of classical machine learning models, including:
- Distributed Linear Regression
- Distributed Decision Trees
- Ensembling Methods (bagging, boosting)
Testers will have 90 minutes to complete the certification exam.
There are 45 multiple-choice questions on the certification exam. The questions will be distributed by high-level topic in the following way:
- Databricks Machine Learning – 29% (13/45)
- ML Workflows – 29% (13/45)
- Spark ML – 33% (15/45)
- Scaling ML Models – 9% (4/45)
Each attempt of the certification exam will cost the tester $200. Testers might be subjected to tax payments depending on their location. Testers are able to retake the exam as many times as they would like, but they will need to pay $200 for each attempt.
There are no test aids available during this exam.
All machine learning code within this exam will be in Python. In the case of workflows or code not specific to machine learning tasks, data manipulation code could be provided in SQL.
Because of the speed at which the responsibilities of a machine learning practitioner and capabilities of the Databricks Lakehouse Platform change, this certification is valid for 2 years following the date on which each tester passes the certification exam.
In order to learn the content assessed by the certification exam, candidates should take one of the following Databricks Academy courses:
- Instructor-led: Scalable Machine Learning with Apache Spark
- Self-paced (available in Databricks Academy): Scalable Machine Learning with Apache Spark
Candidates are also able to learn more about the certification exam by taking the certification exam’s overview course (coming soon).