Data Engineering with Databricks

This course prepares data professionals to leverage the Databricks Lakehouse Platform to productionalize ETL pipelines. Students will use Delta Live Tables to define and schedule pipelines that incrementally process new data from a variety of data sources into the Lakehouse. Students will also orchestrate tasks with Databricks Workflows and promote code with Databricks Repos.

Skill Level

Associate

Duration

12h

Prerequisites

Prerequisities for both versions of the course (Spark SQL and PySpark):

Beginner familiarity with cloud computing concepts (virtual machines, object storage, etc.)
Production experience working with data warehouses and data lakes
Familiarity with basic SQL concepts (select, filter, group by, join, etc)

Additional prerequisites for the Python version of this course (PySpark):

Beginner programming experience with Python (syntax, conditions, loops, functions)
Beginner programming experience with the Spark DataFrame API:
Configure DataFrameReader and DataFrameWriter to read and write data
Express query transformations using DataFrame methods and Column expressions
Navigate the Spark documentation to identify built-in functions for various transformations and data types

Self-Paced

Custom-fit learning paths for data, analytics, and AI roles and career paths through on-demand videos

See all our registration options

Registration options

Databricks has a delivery method for wherever you are on your learning journey

Self-Paced

Custom-fit learning paths for data, analytics, and AI roles and career paths through on-demand videos

Instructor-Led

Public and private courses taught by expert instructors across half-day to two-day courses

Blended Learning

Self-paced and weekly instructor-led sessions for every style of learner to optimize course completion and knowledge retention. Go to Subscriptions Catalog tab to purchase

Purchase now

Skills@Scale

Comprehensive training offering for large scale customers that includes learning elements for every style of learning. Inquire with your account executive for details

Upcoming Public Classes

Data Analyst

Get Started with Databricks for Data Analysis

In this course, you will learn basic skills that will allow you to use the Databricks Data Intelligence Platform to perform a simple data analytics workflow and support data warehousing endeavors. You will be given a tour of the workspace and be shown how to work with data objects in Databricks such as catalogs, schemas, tables, compute clusters, notebooks, and dashboards. You will then follow a basic data analytics workflow to perform tasks such as manipulating data using Databricks SQL, leveraging Delta Lake version logs to time travel, creating dashboards within the platform, and creating Genie Spaces for data exploration using natural language prompts. You will also learn how Databricks supports data warehousing needs through the use of Databricks SQL, Delta Live Tables, and Unity Catalog. With the purchase of a Databricks Labs subscription, the course also closes out with a comprehensive lab exercise to practice what you’ve learned in a live Databricks Workspace environment.

Paid & Subscription

Lab

Onboarding

Machine Learning Practitioner

Data Preparation for Machine Learning

This course focuses on the fundamentals of preparing data for machine learning using Databricks. Participants will learn essential skills for exploring, cleaning, and organizing data tailored for traditional machine learning applications. Key topics include data visualization, feature engineering, and optimal feature storage strategies. Through practical exercises, participants will gain hands-on experience in efficiently preparing data sets for machine learning within the Databricks. This course is designed for associate-level data scientists and machine learning practitioners. and individuals seeking to enhance their proficiency in data preparation, ensuring a solid foundation for successful machine learning model deployment.

Deploy Workloads with Databricks Workflows

By scheduling tasks with Databricks Jobs, applications can be run automatically to keep tables in the Lakehouse fresh. Using Databricks SQL to schedule updates to queries and dashboards allows quick insights using the newest data. In this course, students will be introduced to task orchestration using the Databricks Workflow Jobs UI. Optionally, they will configure and schedule dashboards and alerts to reflect updates to production data pipelines.

In this course, you’ll learn how to orchestrate data pipelines with Databricks Workflow Jobs and schedule dashboard updates to keep analytics up-to-date. We’ll cover topics like getting started with Databricks Workflows, how to use Databricks SQL for on-demand queries, and how to configure and schedule dashboards and alerts to reflect updates to production data pipelines.

Note: This course is part of the 'Data Engineering with Databricks' course series.

Paid & Subscription

Lab

Associate