Session

Data Ingestion with LakeFlow Connect

Overview

ExperienceIn Person
TypePaid Training
Duration240 min

In this course, you’ll learn how to have efficient data ingestion with LakeFlow Connect and manage that data with Databricks. We’ll cover topics such as ingestion with built-in connectors for popular SaaS applications, databases and file sources, as well as ingestion from cloud object storage, and batch and streaming ingestion. LakeFlow Connect is fully integrated with the Data Intelligence Platform including unified governance, observability, and Delta Lake for the foundation of a data lakehouse architecture. We'll cover the new connector components, setting up the pipeline, validating the source and mapping to the destination for each type of connector. We'll also cover how to ingest data with Batch to Streaming ingestion into Delta tables, using the UI with Auto Loader, automating ETL with Lakeflow Pipelines (previously Delta Live Tables) or using the API.

 

Pre-requisites: Beginner familiarity with the Databricks Data Intelligence Platform (selecting clusters, navigating the Workspace, executing notebooks), cloud computing concepts (virtual machines, object storage, etc.), production experience working with data warehouses and data lakes, intermediate experience with basic SQL concepts (select, filter, groupby, join, etc), beginner programming experience with Python (syntax, conditions, loops, functions), beginner programming experience with the Spark DataFrame API (Configure DataFrameReader and DataFrameWriter to read and write data, Express query transformations using DataFrame methods and Column expressions, etc.)

 

Labs: Yes

Certification Path: Databricks Certified Data Engineer Associate