Data Engineering with Databricks
This course prepares data professionals to leverage the Databricks Lakehouse Platform to productionalize ETL pipelines. Students will use Delta Live Tables to define and schedule pipelines that incrementally process new data from a variety of data sources into the Lakehouse. Students will also orchestrate tasks with Databricks Workflows and promote code with Databricks Repos.
Prerequisities for both versions of the course (Spark SQL and PySpark):
- Beginner familiarity with cloud computing concepts (virtual machines, object storage, etc.)
- Production experience working with data warehouses and data lakes
- Familiarity with basic SQL concepts (select, filter, group by, join, etc)
Additional prerequisites for the Python version of this course (PySpark):
- Beginner programming experience with Python (syntax, conditions, loops, functions)
- Beginner programming experience with the Spark DataFrame API:
- Configure DataFrameReader and DataFrameWriter to read and write data
- Express query transformations using DataFrame methods and Column expressions
- Navigate the Spark documentation to identify built-in functions for various transformations and data types
Self-Paced
Custom-fit learning paths for data, analytics, and AI roles and career paths through on-demand videos
Registration options
Databricks has a delivery method for wherever you are on your learning journey
Self-Paced
Custom-fit learning paths for data, analytics, and AI roles and career paths through on-demand videos
Register nowInstructor-Led
Public and private courses taught by expert instructors across half-day to two-day courses
Register nowBlended Learning
Self-paced and weekly instructor-led sessions for every style of learner to optimize course completion and knowledge retention. Go to Subscriptions Catalog tab to purchase
Purchase nowSkills@Scale
Comprehensive training offering for large scale customers that includes learning elements for every style of learning. Inquire with your account executive for details