Fabricator: Streamlining Declarative Feature Engineering at DoorDash
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Breakout |
TRACK | Data Science and Machine Learning |
INDUSTRY | Enterprise Technology, Retail and CPG - Food |
TECHNOLOGIES | AI/Machine Learning, Apache Spark, Delta Lake |
SKILL LEVEL | Intermediate |
DURATION | 40 min |
DOWNLOAD SESSION SLIDES |
Feature engineering, a crucial aspect of machine learning, presents unique challenges compared to general data engineering. We developed Fabricator, a comprehensive framework to streamline declarative data pipelines for machine learning at DoorDash. Fabricator efficiently orchestrates 1400 daily batch jobs, managing 2.2 trillion feature values across all business verticals. With a job registry, a library for large-scale data ELT jobs, and an orchestration and execution service, Fabricator offers numerous advantages. It streamlines feature development with a declarative feature DSL and centralized repository, accelerates data fabrication using a high-level SDK, mitigates latency and consistency discrepancies between offline and online feature data, and automates operational tasks like batch ETL jobs, feature uploads, and real-time feature computation. We will discuss how we leveraged Databricks Jobs and Delta Lake in Fabricator’s construction and share what we learned.
SESSION SPEAKERS
Kunal Shah
/Software Engineer
Doordash
Hebo Yang
/ML Infra Eng
DoorDash