Monitoring and Optimizing Apache Spark Workloads on Databricks
This course explores the Lakehouse architecture and Medallion design for scalable data workflows, focusing on Unity Catalog for secure data governance, access control, and lineage tracking. The curriculum includes building reliable, ACID-compliant pipelines with Delta Lake. You'll examine Spark optimization techniques, such as partitioning, caching, and query tuning, and learn performance monitoring, troubleshooting, and best practices for efficient data engineering and analytics to address real-world challenges.
- Basic programming knowledge
- Familiarity with Python
- Basic understanding of SQL queries (SELECT, JOIN, GROUP BY)
- Familiarity with data processing concepts
- No prior Spark or Databricks experience required
Outline
Monitoring and Optimizing Apache Spark Workloads on DatabricksApache Spark and Databricks
- Using Apache Spark with Delta Lake
- Demo: Introduction to Delta Lake
- Lab: Introduction to Delta Lake
- Optimizing Apache Spark
- Demo: Optimizing Apache Spark
- Lab: Optimizing Apache Spark
Self-Paced
Custom-fit learning paths for data, analytics, and AI roles and career paths through on-demand videos
Registration options
Databricks has a delivery method for wherever you are on your learning journey
Self-Paced
Custom-fit learning paths for data, analytics, and AI roles and career paths through on-demand videos
Register nowInstructor-Led
Public and private courses taught by expert instructors across half-day to two-day courses
Register nowBlended Learning
Self-paced and weekly instructor-led sessions for every style of learner to optimize course completion and knowledge retention. Go to Subscriptions Catalog tab to purchase
Purchase nowSkills@Scale
Comprehensive training offering for large scale customers that includes learning elements for every style of learning. Inquire with your account executive for details