Incremental Processing on Large Analytical Datasets - Databricks

Incremental Processing on Large Analytical Datasets

Download Slides

Prasanna Rajaperumal and Vinoth Chandar will explore a specific problem of ingesting petabytes of data in Uber and why they ended up building an analytical datastore from scratch using Spark. Prasanna will discuss design choices and implementation approaches in building Hoodie to provide near-real-time data ingestion and querying using Spark and HDFS. Session hashtag: #SFexp4

About Prasanna Padmanabhan

Prasanna is currently an engineer on the Personalization Infrastructure team at Netflix. His primary focus is on building various big data infrastructure components using Spark that help our algorithmic engineers to innovate faster and improve personalization for our members. In the past, he has built distributed data systems that leverages both batch and stream processing.

About Vinoth Chandar

Vinoth is the founding engineer/architect of the data team at Uber, as well as author of many data processing & querying systems at Uber, including "Hoodie". He has keen interest in unified architectures for data analytics and processing. Previously, Vinoth was the lead on Linkedin’s Voldemort key value store and has also worked on Oracle Database replication engine, HPC, and stream processing.