Data can be viewed as the exhaust of online activity. With the rise of cloud-based data platforms, barriers to data storage and transfer have crumbled. The demand for creative applications and learning from those datasets has accelerated. Rapid acceleration can quickly accrue disorder, and disorderly data design can turn the deepest data lake into an impenetrable swamp.
In this talk, I will discuss the evolution of the data science workflow at Expedia with a special emphasis on Learning to Rank problems. From the heroic early days of ad-hoc Spark exploration to our first production sort model on the cloud, we will explore the process of industrializing the workflow. Layered over our story, I will share some best practices and suggestions on how to keep your data productive, or even pull your organization out of the data swamp.
Grown in Hawai`i, Sean trained in mathematical physics at Texas A&M University with an emphasis on supergravity and string phenomenology. He jumped to industry with Expedia in 2015, where he is now a Senior Data Scientist. He presently works on assorted problems, particularly in auctions, marketplace design and Learning to Rank.