Zipline: Airbnb's Machine Learning Data Management Platform - Databricks

Zipline: Airbnb’s Machine Learning Data Management Platform

Download Slides

Zipline is Airbnb’s data management platform specifically designed for ML use cases. Previously, ML practitioners at Airbnb spent roughly 60% of their time on collecting and writing transformations for machine learning tasks. Zipline reduces this task from months to days. It allows users to define features in an easy-to-use configuration language, then provides access to the following features: resource efficient and point-in-time correct training set backfills and scheduled updates, feature visualizations and automatic data quality monitoring, feature availability in online scoring environment: batch and streaming with batch correction (lambda architecture), collaboration and sharing of features, and data ownership and management.

Spark powers many of Zipline’s features, especially offline tasks for efficient training set backfills and feature computation. This talk covers Ziplines architecture and the main problems that Zipline solves. Despite being widespread, there is no open source software to address these problems. As a result, we intend to open source our work.

Session hashtag: #ML3SAIS

About Nikhil Simha

Nikhil is a Software Engineer on the Machine Learning infrastructure team at Airbnb. He is currently working on Zipline. Prior to Airbnb, he worked on the stream processing platform at Facebook. He is also the co-author of Realtime Data Processing at Facebook (SIGMOD-16). Nikhil got his Bachelors degree in Computer Science from Indian Institute of Technology, Bombay.