Pranav Anand

Software Engineer, Databricks

Pranav Anand is a Software Engineer at Databricks. He has been developing Auto Loader and Delta Lake to simplify the lives of Data Engineers. Pranav received his BS in Computer Science at University of Waterloo.

Past sessions

Continuously and incrementally ingesting data as it arrives in cloud storage has become a common workflow in our customers' ETL pipelines. However, managing this workflow is rife with challenges, such as scalable and efficient file discovery, schema inference and evolution, and fault tolerance with exactly-once guarantees. Auto Loader is a new Structured Streaming source in Databricks as our all-in-one solution to tackle these challenges.

 

In this talk, we’ll discuss how Auto Loader:

  • Can discover files efficiently through file notifications or incremental file listing
  • Can scale to handling billions of files as metadata and still provide exactly once processing guarantees
  • Can infer the schema of data and detect schema drift over time
  • Can evolve the schema of the data being processed
  • Is used within Databricks to ingest millions of files that are being uploaded every hour efficiently
In this session watch:
Pranav Anand, Software Engineer, Databricks

[daisna21-sessions-od]