Deploying machine learning models seems like it should be a relatively easy task. Take your model and pass it some features in production. The reality is that the code written during the prototyping phase of model development doesn’t always work when applied at scale or on “real” data. This talk will explore 1) common problems at the intersection of data science and data engineering 2) how you can structure your code so there is minimal friction between prototyping and production, and 3) how you can use Apache Spark to run predictions on your models in batch or streaming contexts.
You will take away how to address some of productionizing issues that data scientists and data engineers face while deploying machine learning models at scale and a better understanding of how to work collaboratively to minimize disparity between prototyping and productizing.
Session hashtag: #SAISDS2
Brandon is a principal data engineer at Eventbrite. He began using Spark in 2014 to help law enforcement find and recover victims of human trafficking. Lately he's been been dedicated to building Eventbrite's data infrastructure around Apache Spark and related tools.
Alex is a senior data engineer at Eventbrite. He began using Spark in 2014 to build event based recommendations. Since then he has been building, extending, and optimizing Eventbrite's data infrastructure on behalf of its analysts, data scientists, and other engineers. From ingesting new data to creating downstream processes, he has been a primary driver of Eventbrite's growth in the world of big data.