Scaling Factorization Machines on Spark Using Parameter Servers

Download Slides

Factorization machines are a relatively new class of model, that are extremely powerful as they are able to efficiently capture arbitrary order interactions between features. FMs are becoming increasingly popular in settings with large amounts of sparse data, including recommender systems and online advertising. Furthermore, with appropriate feature engineering, they can mimic most commonly used factorization-based models for collaborative filtering. However, one drawback of FMs is that, even though they are relatively efficient to train, they can still be difficult to scale to very large feature dimensions. This talk will explore scaling up FMs on Spark, using the Glint parameter server built on Akka. Rather than a general exploration of parameter server architectures, the focus will be on specific technical aspects of training factorization machines, with code examples and performance analysis and comparisons. It will also cover integration with Spark DataFrames and ML pipelines for feature engineering and cross-validation. Example code will be available as open source.