SESSION

Uber's Batch Analytics Evolution from Hive to Spark

OVERVIEW

EXPERIENCEIn Person
TYPEBreakout
TRACKData Warehousing - Analytics and BI
INDUSTRYTravel and Hospitality
TECHNOLOGIESApache Spark
SKILL LEVELAdvanced
DURATION40 min

About 40% of Uber's substantial ETL expenses, amounting to multimillion dollars, were associated with ETL processes on Hive. At Uber, approximately 30,000 ETL workflows and approximately two million weekly queries utilized Hive for various purposes, including ML, Compliance/Regulatory Reporting, Finance, and Product Development. As a strategic move to enhance efficiency in Batch Analytics, Uber decided to migrate all Hive workloads to SparkSQL.

 

This migration included the development of automation features such as transpilation of Hive queries to SparkSQL, parallel execution on Spark, and the implementation of a validation framework for data correctness and performance. This session will explore Uber’s auto-migration framework's architecture in-depth, addressing challenges encountered throughout the migration process and their effective resolutions. Additionally, insights into the overall efficiency gains from this migration will be shared.

SESSION SPEAKERS

Kumudini Kakwani

/Senior Software Engineer
Uber

IMAGE COMING SOON

Akshayaprakash Sharma

/Senior Software Engineer
UBER