Session

Accelerating Data Ingestion with New Innovations in Auto Loader’s Performance and Schema Evolution

Overview

ExperienceIn Person
TypeBreakout
TrackData Engineering and Streaming
IndustryEnterprise Technology, Retail and CPG - Food, Financial Services
TechnologiesApache Spark, LakeFlow
Skill LevelIntermediate
Duration40 min

Auto Loader is a powerful structured streaming data source connector from Lakeflow Connect, trusted by more than 4,000 Databricks customers to ingest multiple petabytes of file data from cloud storage every day. In this session, we will explore key innovations and enhancements in Auto Loader’s performance and schema evolution capabilities, including:

  • Accelerated ingestion of millions of files by parallelizing file reading and processing within a single Apache Spark™ task
  • Faster parallelized processing of large compressed and multi-line files
  • Improved file discovery and state management for enhanced scalability
  • Advanced schema evolution with type widening support
  • Support for the Variant data type, providing greater flexibility in handling evolving schemas

You will gain insights into how these enhancements can help overcome data schema challenges while building more performant, scalable, and cost-effective ingestion pipelines with Lakeflow Connect.

Session Speakers

IMAGE COMING SOON

Elise Georis

/Staff Product Manager
Databricks

IMAGE COMING SOON

Sandip Agarwala

/Databricks