Empowering Zillow’s Developers with Self-Service ETL

May 26, 2021 03:50 PM (PT)

Download Slides

As the amount of data and the number of unique data sources within an organization grow, handling the volume of new pipeline requests becomes difficult. Not all new pipeline requests are created equal — some are for business-critical datasets, others are for routine data preparation, and others are for experimental transformations that allow data scientists to iterate quickly on their solutions.

To meet the growing demand for new data pipelines, Zillow created multiple self-service solutions that enable any team to build, maintain, and monitor their data pipelines. These tools abstract away the orchestration, deployment, and Apache Spark processing implementation from their respective users. In this talk, Zillow engineers discuss two internal platforms they created to address the specific needs of two distinct user groups: data analysts and data producers. Each platform addresses the use cases of its intended user, leverages internal services through its modular design, and empowers users to create their own ETL without having to worry about how the ETL is implemented.

Members of Zillow’s data engineering team discuss:

  • Why they created two separate user interfaces to meet the needs different user groups
  • What degree of abstraction from the orchestration, deployment, processing, and other ancillary tasks that chose for each user group
  • How they leveraged internal services and packages, including their Apache Spark package — Pipeler, to democratize the creation of high-quality, reliable pipelines within Zillow

 

Derek Gorthy

Derek Gorthy is a senior software engineer on Zillow’s Big Data team. He is currently focused on leveraging Apache Spark to design the next generation of pipelines for the Zillow Offers business. Pr...
Read more

Yuan Feng

Yuan Feng is a software engineer on Zillow’s Big Data team. He has been working on building the self-service platform to automate ETL building process, building business datasets, data processing li...
Read more