Spotify uses a range of Machine Learning models to power its music recommendation features including the Discover page and Radio. Due to the iterative nature of these models they are a natural fit to the Spark computation paradigm and suffer from the IO overhead incurred by Hadoop. In this talk, I will review 2 collaborative filtering models and how we’ve scaled them up to handle 100s of Billions of data points using Scala, Breeze, and Spark.
Chris Johnson is a machine learning engineer at Spotify where he hacks on music data, builds the best music recommendation system on the planet, and feeds multiple terabytes of data to Hadoop and Spark every day. Chris’s toolchest includes Scala, Python, Breeze, Numpy, Scikit-Learn Hadoop, Scalding, Spark, and Cassandra. As both a researcher and an engineer Chris is interested in problems of high dimension and efficient methods of scaling learning under the presence of massive data sets. He is particularly interested in the scalability, design, and architecture decisions that arise within real-time recommender systems such as music recommendation. His research has been featured at premier Machine Learning conferences including NIPS and AISTATS. Chris holds MS and BS degrees from UT Austin.