Welcome to Data Brew by Databricks with Denny and Brooke! In this series, we explore various topics in the data and AI community and interview subject matter experts in data engineering/data science. So join us with your morning brew in hand and get ready to dive deep into data + AI!
For our second season, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more.
In the season opener, Matei Zaharia discusses how he entered the field of ML, best practices for productionizing ML pipelines, leveraging MLflow & the Data Lakehouse architecture for reproducible ML, and his current research in this field.
Have you ever wondered how your purchasing behavior may reveal protected attributes? Or how data scientists and business play a role in combating bias? We discuss with Diana Pfeil recommendations to reduce bias and improve fairness, from SHAP to adversarial debiasing.
Adam Oliner discusses how to design your infrastructure to support ML, from integration tests to glue code, the importance of iteration, and centralized vs decentralized data science teams. He provides valuable advice for companies investing in ML and crucial lessons he’s learned from founding two companies.
Brooke Wenig is a Machine Learning Practice Lead at Databricks. She leads a team of data scientists who develop large-scale machine learning pipelines for customers, as well as teach courses on distributed machine learning best practices. Previously, she was a Principal Data Science Consultant at Databricks. She received an MS in Computer Science from UCLA with a focus on distributed machine learning. She speaks Mandarin Chinese fluently and enjoys cycling.
Denny Lee is a Developer Advocate at Databricks. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premise and cloud environments. He also has a Masters of Biomedical Informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise Healthcare customers. His current technical focuses include Distributed Systems, Apache Spark, Deep Learning, Machine Learning, and Genomics.
Contact the Data Brew team on Twitter: @databrew_db
Brooke & Denny are two of the co-authors of Learning Spark, 2nd edition.