Nikhil Simha

Senior Engineer, Airbnb

Nikhil is a Software Engineer on the Machine Learning infrastructure team at Airbnb. He is currently working on Bighead, an end-to-end machine learning platform. Prior to Airbnb, he built self healing scheduler – called Turbine, a real-time data processing engine – called stylus at Facebook. He is also the co-author of Realtime Data Processing at Facebook (SIGMOD-16) and Bighead(DSAA-2019) Nikhil got his Bachelors degree in Computer Science from Indian Institute of Technology, Bombay.

Past sessions

Summit 2021 Sawtooth Windows for Feature Aggregations

May 28, 2021 11:05 AM PT

In this talk about zipline, we will introduce a new type of windowing construct called a sawtooth window. We will describe various properties about sawtooth windows that we utilize to achieve online-offline consistency, while still maintaining high-throughput, low-read latency and tunable write latency for serving machine learning features.We will also talk about a simple deployment strategy for correcting feature drift - due operations that are not "abelian groups", that operate over change data.

In this session watch:
Nikhil Simha, Senior Engineer, Airbnb

[daisna21-sessions-od]

Summit 2020 Zipline – A Declarative Feature Engineering Framework

June 23, 2020 05:00 PM PT

Zipline is Airbnb's data management platform specifically designed for ML use cases. Previously, ML practitioners at Airbnb spent roughly 60% of their time on collecting and writing transformations for machine learning tasks. Zipline reduces this task from months to days - by making the process declarative. It allows data scientists to easily define features in a simple configuration language. The framework then provides access to point-in-time correct features - for both - offline model training and online inference. In this talk we will describe the architecture of our system and the algorithm that makes the problem of efficient point-in-time correct feature generation, tractable.

Zipline is Airbnb's data management platform specifically designed for ML use cases. Previously, ML practitioners at Airbnb spent roughly 60% of their time on collecting and writing transformations for machine learning tasks. Zipline reduces this task from months to days. It allows users to define features in an easy-to-use configuration language, then provides access to the following features: resource efficient and point-in-time correct training set backfills and scheduled updates, feature visualizations and automatic data quality monitoring, feature availability in online scoring environment: batch and streaming with batch correction (lambda architecture), collaboration and sharing of features, and data ownership and management.

Spark powers many of Zipline's features, especially offline tasks for efficient training set backfills and feature computation. This talk covers Ziplines architecture and the main problems that Zipline solves. Despite being widespread, there is no open source software to address these problems. As a result, we intend to open source our work.

Session hashtag: #ML3SAIS