Databricks Performance Optimization

In this course, you’ll learn how to optimize workloads and physical layout with Spark and Delta Lake and analyze the Spark UI to assess performance and debug applications. We’ll cover topics like streaming, liquid clustering, data skipping, caching, photons, and more.

Note: This course is part of the 'Advanced Data Engineering with Databricks' course series.

Skill Level

Professional

Duration

Prerequisites

The content was developed for participants with these skills/knowledge/abilities:

Ability to perform basic code development tasks using Databricks (create clusters, run code in notebooks, use basic notebook operations, import repos from git, etc.)
Intermediate programming experience with PySpark

- - Extract data from a variety of file formats and data sources
  - Apply a number of common transformations to clean data
Reshape and manipulate complex data using advanced built-in functions

Intermediate programming experience with Delta Lake (create tables, perform complete and incremental updates, compact files, restore previous versions, etc.)

Self-Paced

Custom-fit learning paths for data, analytics, and AI roles and career paths through on-demand videos

See all our registration options

Registration options

Databricks has a delivery method for wherever you are on your learning journey

Self-Paced

Custom-fit learning paths for data, analytics, and AI roles and career paths through on-demand videos

Instructor-Led

Public and private courses taught by expert instructors across half-day to two-day courses

Blended Learning

Self-paced and weekly instructor-led sessions for every style of learner to optimize course completion and knowledge retention. Go to Subscriptions Catalog tab to purchase

Purchase now

Skills@Scale

Comprehensive training offering for large scale customers that includes learning elements for every style of learning. Inquire with your account executive for details

Upcoming Public Classes

Machine Learning Practitioner

Machine Learning at Scale

In this course, you will gain theoretical and practical knowledge of Apache Spark’s architecture and its application to machine learning workloads within Databricks. You will learn when to use Spark for data preparation, model training, and deployment, while also gaining hands-on experience with Spark ML and pandas APIs on Spark. This course will introduce you to advanced concepts like hyperparameter tuning and scaling Optuna with Spark. This course will use features and concepts introduced in the associate course, such as MLflow and Unity Catalog, for comprehensive model packaging and governance.

Note: This course is the first in the series of Advanced Machine Learning.

Free

Professional

Generative AI Engineer

Generative AI Fundamentals

Welcome to Generative AI Fundamentals. This course provides an introduction to how organizations can understand and utilize generative artificial intelligence (AI) models. First, we'll start off with a quick introduction to generative AI - we'll discuss what it is and pay special attention to large language models, also known as LLMs. Then, we’ll move into how organizations can find success with generative AI - we’ll take a deeper dive into what LLM applications are, discuss how Lakehouse AI can help you succeed, and discuss essential considerations for adopting AI in general. Finally, we'll tackle important aspects to consider when evaluating the potential risks and challenges associated with using/adopting generative AI.

Languages Available: English | 日本語 | Português BR | 한국어

Free

Introductory

Generative AI Engineer

Generative AI Application Development

The course is designed to provide you the practical experience in building advanced LLM applications using multi-stage reasoning LLM chains and agents. First, you’ll learn how to decompose a problem into its components and select the most suitable model for each step to enhance business use cases. Following this, we’ll show you how to construct a multi-stage reasoning chain utilizing LangChain and HuggingFace transformers. Finally, you’ll be introduced to agents and will design an autonomous agent using generative models on Databricks.

Note: This is the second course in the 'Generative AI Engineering with Databricks’ series.

Free

Associate