Session
Accelerating Data Ingestion with New Innovations in Auto Loader’s Performance and Schema Evolution
Overview
Experience | In Person |
---|---|
Type | Breakout |
Track | Data Engineering and Streaming |
Industry | Enterprise Technology, Retail and CPG - Food, Financial Services |
Technologies | Apache Spark, LakeFlow |
Skill Level | Intermediate |
Duration | 40 min |
Auto Loader is a powerful structured streaming data source connector from Lakeflow Connect, trusted by more than 4,000 Databricks customers to ingest multiple petabytes of file data from cloud storage every day. In this session, we will explore key innovations and enhancements in Auto Loader’s performance and schema evolution capabilities, including:
- Accelerated ingestion of millions of files by parallelizing file reading and processing within a single Apache Spark™ task
- Faster parallelized processing of large compressed and multi-line files
- Improved file discovery and state management for enhanced scalability
- Advanced schema evolution with type widening support
- Support for the Variant data type, providing greater flexibility in handling evolving schemas
You will gain insights into how these enhancements can help overcome data schema challenges while building more performant, scalable, and cost-effective ingestion pipelines with Lakeflow Connect.
Session Speakers
IMAGE COMING SOON
Elise Georis
/Staff Product Manager
Databricks
IMAGE COMING SOON
Sandip Agarwala
/Databricks