Conde Nast is a global leader in the media production space housing iconic brands such as The New Yorker, Wired, Vanity Fair, and Epicurious, among many others. Along with our content production, Conde Nast invests heavily in companion products to improve and enhance our audience’s experience. One such product solution is Spire, Conde Nast’s service for user segmentation, and targeted advertising for over a hundred million users. Spire consists of thousands of models, many of which require individual scheduling and optimization. From data preparation to model training to interference, we’ve built abstractions around the data flow, monitoring, orchestration, and other internal operations. In this talk, we explore the complexities of building large scale machine learning pipelines within Spire and discuss some of the solutions we’ve discovered using Databricks, MLflow, and Apache Spark. The key focus is on production-grade engineering patterns, the inner workings the required components, and the lessons learned throughout their development.
I am a former academic and current programmer. I enjoy everything to do with data, including data science, data engineering, and machine learning, and believe it is important to have a holistic understanding of the data pipeline. I am an insight data engineering fellow, and through insight I have gained experience with AWS, cloud computing, machine learning, database management, and algorithmic coding.
I'm a Software Engineer based in NYC.