Databricks Certification and Badging
The new standard for lakehouse training and certifications
Databricks Certified Data Engineer Associate
The Databricks Certified Data Engineer Associate certification exam assesses an individual’s ability to use the Databricks Lakehouse Platform to complete introductory data engineering tasks. This includes an understanding of the Lakehouse Platform and its workspace, its architecture, and its capabilities. It also assesses the ability to perform multi-hop architecture ETL tasks using Apache Spark SQL and Python in both batch and incrementally processed paradigms. Finally, the exam assesses the tester’s ability to put basic ETL pipelines and Databricks SQL queries and dashboards into production while maintaining entity permissions. Individuals who pass this certification exam can be expected to complete basic data engineering tasks using Databricks and its associated tools.
In order to achieve this certification, earners must pass a certification exam. In order to achieve this certification, please either log in or create an account in our certification platform.
This certification is part of the Data Engineer learning pathway.
Key details about the certification exam are provided below.
Minimally Qualified Candidate
The minimally qualified candidate should be able to:
- Understand how to use and the benefits of using the Databricks Lakehouse Platform and its tools, including:
- Data Lakehouse (architecture, descriptions, benefits)
- Data Science and Engineering workspace (clusters, notebooks, data storage)
- Delta Lake (general concepts, table management and manipulation, optimizations)
- Build ETL pipelines using Apache Spark SQL and Python, including:
- Relational entities (databases, tables, views)
- ELT (creating tables, writing data to tables, cleaning data, combining and reshaping tables, SQL UDFs)
- Python (facilitating Spark SQL with string manipulation and control flow, passing data between PySpark and Spark SQL)
- Incrementally process data, including:
- Structured Streaming (general concepts, triggers, watermarks)
- Auto Loader (streaming reads)
- Multi-hop Architecture (bronze-silver-gold, streaming applications)
- Delta Live Tables (benefits and features)
- Build production pipelines for data engineering applications and Databricks SQL queries and dashboards, including:
- Jobs (scheduling, task orchestration, UI)
- Dashboards (endpoints, scheduling, alerting, refreshing)
- Understand and follow best security practices, including:
- Unity Catalog (benefits and features)
- Entity Permissions (team-based permissions, user-based permissions)
Testers will have 90 minutes to complete the certification exam.
There are 45 multiple-choice questions on the certification exam. The questions will be distributed by high-level topic in the following way:
- Databricks Lakehouse Platform – 24% (11/45)
- ELT with Spark SQL and Python – 29% (13/45)
- Incremental Data Processing – 22% (10/45)
- Production Pipelines – 16% (7/45)
- Data Governance – 9% (4/45)
Each attempt of the certification exam will cost the tester $200. Testers might be subjected to tax payments depending on their location. Testers are able to retake the exam as many times as they would like, but they will need to pay $200 for each attempt.
There are no test aids available during this exam.
The certification exam will provide data manipulation code in SQL when possible. In all other cases, code will be in Python.
Because of the speed at which the responsibilities of a data engineer and capabilities of the Databricks Lakehouse Platform change, this certification is valid for 2 years following the date on which each tester passes the certification exam.
In order to learn the content assessed by the certification exam, candidates should take one of the following Databricks Academy courses:
- Instructor-led: Data Engineering with Databricks
- Self-paced: Data Engineering with Databricks (available in Databricks Academy)
Candidates are also able to learn more about the certification exam by taking the certification exam’s overview course (coming soon).