Yuan Feng

Software Development Engineer, Zillow

Yuan Feng is a software engineer on Zillow’s Big Data team. He has been working on building the self-service platform to automate ETL building process, building business datasets, data processing libraries leveraging Apache Spark and Apache Beam. Before joining Zillow, he worked on building ETL/ML models in Tencent. He has a master degree from School of Computer Science from Carnegie Mellon University.

Past sessions

Summit 2021 Empowering Zillow’s Developers with Self-Service ETL

May 26, 2021 03:50 PM PT

As the amount of data and the number of unique data sources within an organization grow, handling the volume of new pipeline requests becomes difficult. Not all new pipeline requests are created equal — some are for business-critical datasets, others are for routine data preparation, and others are for experimental transformations that allow data scientists to iterate quickly on their solutions.

To meet the growing demand for new data pipelines, Zillow created multiple self-service solutions that enable any team to build, maintain, and monitor their data pipelines. These tools abstract away the orchestration, deployment, and Apache Spark processing implementation from their respective users. In this talk, Zillow engineers discuss two internal platforms they created to address the specific needs of two distinct user groups: data analysts and data producers. Each platform addresses the use cases of its intended user, leverages internal services through its modular design, and empowers users to create their own ETL without having to worry about how the ETL is implemented.

Members of Zillow’s data engineering team discuss:

  • Why they created two separate user interfaces to meet the needs different user groups
  • What degree of abstraction from the orchestration, deployment, processing, and other ancillary tasks that chose for each user group
  • How they leveraged internal services and packages, including their Apache Spark package — Pipeler, to democratize the creation of high-quality, reliable pipelines within Zillow