SESSION

Drastically Reducing Processing Costs with Delta Lake

Accept Cookies to Play Video

OVERVIEW

EXPERIENCEIn Person
TYPEBreakout
TRACKData Lakehouse Architecture
INDUSTRYEnterprise Technology, Travel and Hospitality
TECHNOLOGIESApache Spark, Delta Lake, ETL
SKILL LEVELIntermediate
DURATION40 min
DOWNLOAD SESSION SLIDES

Amadeus is a global technology company providing solutions for the travel industry, including data analytics tools to help airlines, hotels, and travel agencies. Our team transforms raw data from different business units into easy-to-use Delta Lake-based star schemas. This is particularly challenging as customers demand several years of historical data and high refresh rates, all at low infrastructure costs. In this session, we will discuss how we drastically optimized our data pipelines using state-of-the-art features on Databricks. We will discuss our successful collaboration with Databricks and present the tools and methodology that helped identify and understand our read-and-write amplification issues. Additionally, we will highlight the essential features and techniques in addressing these challenges, including Predictive I/O, Photon, Deletion Vectors, Partition Pruning and Dynamic File Pruning.

SESSION SPEAKERS

Mauricio Jost

/Principal Data Engineer
Amadeus

Generoso Pagano

/Principal Data Engineer
Amadeus