Emilie de Longueau is a Senior Software Engineer (Machine Learning) in the Communities AI team at LinkedIn, focused on driving member engagement through personalized and scalable Follow Recommendations for hundreds of millions of members. She has 5 years of industry experience in Data Science/Machine Learning and building Big Data solutions and algorithms using Spark. Emilie holds Master’s degrees in Industrial Engineering and Operational Research, from the University of California (Berkeley) and Ecole des Ponts ParisTech (Paris). Her expertise in Apache Spark has helped her team modernize its offline scoring infrastructure to improve scalability and relevance of Follow Recommendations.
June 25, 2020 05:00 PM PT
The Communities AI team at LinkedIn generates follow recommendations from a large (10's of millions) set of entities to each of our 690+ million members. These recommendations are driven by ML models that rely on three sets of features (member, entity, and interaction features). In order to support a fast-growing user base, an expanding set of recommendable entities (members, companies, hashtags, groups, newsletters etc.) and more sophisticated modeling approaches, we have re-engineered the system to allow for efficient offline scoring in Spark. In particular, we have handled the 'explosive' growth of data by developing a 2D Hash-Partitioned Join algorithm that optimizes the join of hundreds of terabytes of features without requiring significant data shuffling. In addition to a 5X runtime performance gain, this opened the opportunity for training and scoring with a suite of non-linear models like XGBoost, which improved the global follow rate on the platform by 15% and downstream engagement on LinkedIn feed from followed entities by 10%.