Ayman Farahat - Databricks

Ayman Farahat

Distinguished Architect, Yahoo


3 Trillion App recommendations, with less than 100 Lines of Spark Code in less than 25 Minutes

Smart Phones have become a staple of everyday life with the average user spending over three hours daily on their phone with the vast majority spent on Apps. The number of Apps has increased exponentially over the past several years. App recommendation and discovery is very important to end users, App developers and advertising platforms. At Yahoo, our goal is to recommend the most relevant Apps to the user from the tens of thousands of available Apps. The Spark framework has become an essential tool in the rapid development of highly reliable and scalable Recommender Systems, and this talk will focus on the practical details of building a recommendation engine on top of Spark's MLLib ALS collaborative-filtering algorithm that can reliably and frequently generate app-rating predictions for 100s of million of users from a space of tens of thousands of Apps. The unique aspect of this work is three-fold. First, We use Spark's SparkSQL and DataFrames functionality to achieve orders of magnitude improvements in the preprocessing of data over baseline systems. Second, clever optimization and modifications to the basic MLLib allow us to frequently generate scores for every combination of user and App (over 6 trillion combinations). Third, we report results from randomized experiments to measure the marginal value of the recommendation.