SparkML: Easy ML Productization for Real-Time Bidding

Download Slides

dataxu bids on ads in real-time on behalf of its customers at the rate of 3 million requests a second and trains on past bids to optimize for future bids. Our system trains thousands of advertiser-specific models and runs multi-terabyte datasets. In this presentation we will share the lessons learned from our transition towards a fully automated Spark-based machine learning system and how this has drastically reduced the time to get a research idea into production. We’ll also share how we: – continually ship models to production – train models in an unattended fashion with auto-tuning capabilities – tune and overbooked cluster resources for maximum performance – ported our previous ML solution into Spark – evaluate the performance of high-rate bidding models

 

Try Databricks
See More Spark + AI Summit in San Francisco 2019 Videos


« back
About Maximo Gurmendez

Maximo holds a master's degree in computer science/AI from Northeastern University, where he attended as a Fulbright Scholar. As Chief Engineer of Montevideo Labs he leads data science engineering projects for complex systems in large US companies. He is an expert in big data technologies and co-author of the popular book 'Mastering Machine Learning on AWS.' Additionally, Maximo is a computer science professor at the University of Montevideo and is director of its data science for business program.

About Javier Buquet

Javier holds a degree in Computer Science from ORT University (Montevideo) and since 2015 has been working with Montevideo Labs as a Senior Data Engineer for large big data projects. He has helped top tech companies to architect their Spark applications, leading many successful projects from design and implementation to deployment. He is also an advocate of clean code as a central paradigm for development.