INSTRUCTOR-LED

Databricks Delta

DB 200Request Info

OVERVIEW

This 1-day course is for data engineers, architects, data scientists and software engineers who want to use Databricks Delta for ETL processing on Data Lakes. The course ends with a capstone project building a complete data pipeline using Databricks Delta.

Each topic includes lecture content along with hands-on labs in the Databricks notebook environment. Students may keep the notebooks and continue to use them with the free Databricks Community Edition offering after the class ends; all examples are guaranteed to run in that environment.

OBJECTIVES

After taking this class, students will be able to:

  • Use the interactive Databricks notebook environment.
  • Use Databricks Delta to create, append and upsert data into a Data Lake.
  • Use Databricks Delta to manage and extract actionable insights out of a Data Lake.
  • Use Databricks Delta’s advanced optimization features to speed up queries.
  • Use Databricks Delta to seamlessly ingest streaming and historical data.
  • Implement a Databricks Delta data pipeline architecture.

PLATFORMS

Supported platforms include Azure Databricks, Databricks Community Edition, and non-Azure Databricks.

  • If you’re planning to use the course on Azure Databricks, select the “Azure Databricks” Platform option.
  • If you’re planning to use the course on Databricks Community Edition or on a non-Azure version of Databricks, select the “Other Databricks” Platform option.

TOPICS

  • Create
    • Work with a traditional data pipeline using online shopping data
    • Identify problems with the traditional data pipeline
    • Use Databricks Delta features to mitigate those problems
  • Append
    • Append new records to a Databricks Delta table
  • Upsert
    • Use Databricks Delta to UPSERT data into existing Databricks Delta tables
  • Streaming
    • Read and write streaming data into a data lake
  • Optimization
    • Optimize a Databricks Delta data pipeline backed by online shopping data
    • Learn about best practices to apply to data pipelines
  • Architecture
    • Get streaming Wikipedia data into a data lake via Kafka broker
    • Write streaming data into a raw table
    • Clean up bronze data and generate normalized query tables
    • Create summary tables of key business metrics
    • Create plots/dashboards of business metrics

Details

  • Duration: 1 Day
  • Hours: 9:00 a.m. – 5:00 p.m.

Target Audience

Data engineers, software engineers, dev-ops, IT operations, and team-leads with experience using Databricks.

Prerequisites

Completed the Getting Started with Apache Spark™ SQL, Getting Started with Apache Spark™ DataFrames, or ETL Part 1 course, or already have similar knowledge

Lab Requirements

  • A computer or laptop
  • Chrome or Firefox Web Browser Internet explorer and Safari are not supported
  • Internet access with unfettered connections to the following domains:
    1. *.databricks.com - required
    2. *.slack.com - highly recommended
    3. spark.apache.org - required
    4. drive.google.com - helpful but not required

Course Syllabus

Module Lecture Hands-on

Create

  • Identify problems with the traditional data pipeline
  • Use Databricks Delta features to mitigate those problems
  • Investigate Delta transaction logs
  • Create records and insert into a Databricks Delta data pipeline using online shopping data

Append

  • Append new records to a Databricks Delta table
  • Append records to a Databricks Delta data pipeline using online shopping data
  • Work with Internet-of-Things data

Upsert

  • Use Databricks Delta to UPSERT data into existing Databricks Delta tables
  • UPSERT data into existing Databricks Delta tables

Streaming

  • Learn how to use Databricks Delta with Structured Streaming to ingest batch and streaming data into the same locations
  • Work with Internet-of-Things data
  • Visualize output in LIVE plots

Optimization

  • vUse Databricks Delta advanced optimization features to speed up queries
  • Learn about best practices to apply to data pipelines
  • Optimize a Databricks Delta data pipeline backed by online shopping data

Architecture

  • Learn about Databricks Delta Architecture
  • Discuss trade offs with lambda architecture
  • Learn about Databricks Delta Architecture
  • Discuss trade offs with lambda architecture
  • Get streaming Wikipedia data into a data lake via Kafka broker
  • Write streaming data into a raw table
  • Clean up bronze data and generate normalized query tables
  • Create summary tables of key business metrics
  • Create plots/dashboards of business metrics