Data Brew
Let’s talk data

Welcome to Data Brew by Databricks with Denny and Brooke!

In this series, we explore various topics in the data and AI community and interview experts in data engineering and data science. So join us with your morning brew in hand and get ready to dive deep into data and AI.

 

See episodesMeet the hosts →

Season 2

For our second season, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more.

Watch or listen on your favorite platform

Miss Season 1?

You can still catch Season 1 on Data Lakehouses here, on YouTube, and on your favorite podcast service like Spotify and Apple Music.

Episodes


S02-E01
ML in Production

In the season opener, Matei Zaharia discusses how he entered the field of ML, best practices for productionizing ML pipelines, leveraging MLflow & the Data Lakehouse architecture for reproducible ML, and his current research in this field.

Watch now


S02-E02
Data Ethics

Have you ever wondered how your purchasing behavior may reveal protected attributes? Or how data scientists and business play a role in combating bias? We discuss with Diana Pfeil recommendations to reduce bias and improve fairness, from SHAP to adversarial debiasing.

Watch now


S02-E03
Infrastructure for ML

Adam Oliner discusses how to design your infrastructure to support ML, from integration tests to glue code, the importance of iteration, and centralized vs decentralized data science teams. He provides valuable advice for companies investing in ML and crucial lessons he’s learned from founding two companies.

Watch now


S02-E04
Hyperparameter and Neural Architecture Search

Liam Li is a leading researcher in the fields of hyperparameter optimization and neural architecture search, and is the author of the seminal Hyperband paper. In this session, Liam discusses the evolution of hyperparameter optimization techniques and illustrates how every data scientist can benefit from neural architecture search.

Watch now


S02-E05
ML Applications

Good machine learning starts with high quality data. Irina Malkova shares her experience managing and ensuring high-fidelity data, developing custom metrics to satisfy business needs, and discusses how to improve internal decision making processes.

Watch now


S02-E06
AutoML

Erin LeDell shares valuable insight on AutoML, what problems are best solved by it, its current limitations, and her thoughts on the future of AutoML. We also discuss founding and growing the Women in Machine Learning and Data Science (WiMLDS) non-profit.

Watch now


S02-E07
Interpretable Machine Learning

What does it mean for a model to be “interpretable”? Ameet Talwalkar shares his thoughts on IML (Interpretable Machine Learning), how it relates to data privacy and fairness, and his research in this field.

Watch now


S02-E08
Feature Engineering

Is there ever a “one-size fits all” approach for feature engineering? Find out this and more with Amanda Casari and Alice Zheng, co-authors of the Feature Engineering for Machine Learning book.

Watch now


S02-E09
Data Driven Software

We branch, version, and test our code, but what if we treated data like code? Tim Hunter joins us to discuss the open-source Data-Driven Software (DDS) package and how it leads to immense gains in collaboration and decreased runtime for data scientists at any organization.

Watch now

About the hosts


Brooke Wenig

Brooke Wenig is a Director of the Machine Learning Practice at Databricks. She leads a team of data scientists who develop large-scale machine learning pipelines for customers, as well as teach courses on distributed machine learning best practices. Previously, she was a Principal Data Science Consultant at Databricks. She received an M.S. in Computer Science from UCLA with a focus on distributed machine learning. She speaks Mandarin Chinese fluently and enjoys cycling.


Denny Lee

Denny Lee is a Developer Advocate at Databricks. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premises and cloud environments. He has a Master’s of Biomedical Informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise healthcare customers. His current technical focuses include distributed systems, Apache Spark, deep learning, machine learning and genomics.

Brooke and Denny are two of the co-authors of Learning Spark, 2nd edition.

Contact the Data Brew team on Twitter: @databrew_db or on LinkedIn