Large-Scale Ads CTR Prediction with Spark and Deep Learning: Lessons Learned

Download Slides

CTR prediction algorithms are essential, and are used extensively for ads bidding and sponsored search. While logistic regression models have proven effective for this kind of problem, rapid growth in the amount of data has created a lot of challenges. For example, how to train a logistic regression model with billions of parameters in a commodity hardware cluster, or how to improve the model’s accuracy with better feature engineering. Other challenges include figuring out how to benefit from popular deep learning technologies to reduce the dependence on human labor and expert knowledge, and how to improve job performance given such a complicated workload.
At Spark Summit East 2017, Hortonworks introduced vector-free L-BFGS to conquer the scalability challenge of MLlib and provide a very scalable logistic regression implementation. In this talk, hear about their experience integrating this implementation with different feature learning technologies to solve Ad CTR prediction problems, and the lessons they learned.

Session hashtag: #SFds1