Nedra Albrecht is a senior data engineer at Zillow with over 20 years-experience working with data. Nedra has worked extensively with data modeling and data architecture for every major data paradigm, including transactional data processing (OLTP), data warehousing (OLAP), and now big data. She is currently focusing on architecting scalable, generic, config driven data pipeline systems that handle the wide variety of data sources that Zillow consumes.
The trade-off between development speed and pipeline maintainability is a constant for data engineers, especially for those in a rapidly evolving organization. Additional ingestions from data sources are frequently added on an as-needed basis, making it difficult to leverage shared functionality between pipelines. Identifying when technical debt is prohibitive for an organization can be difficult, but remedying it can be even more so. As the Zillow data engineering team grappled with their own technical debt, they identified the need for higher data quality enforcement, the consolidation of shared pipeline functionality, and a scalable way to implement complex business logic for their downstream data scientists and machine learning engineers.
In this talk, the Zillow team explains how they designed their new end-to-end pipeline architecture to make the creation of additional pipelines robust, maintainable and scalable, all while writing fewer lines of code with Apache Spark.
Members of Zillow's data engineering team discuss: