Dr Yacov Salomon has over 10 years experience working with large, real-world complex data sets. He has headed multiple data science teams in companies including Salesforce, Krux, Bigcommerce and Brandscreen. He was responsible for research, design and development of systems and applications for large-scale machine learning, data mining, real-time bidding, intelligent marketing, attribution and recommendations. Yacov holds a Ph.D. in applied mathematics and his academic research focused on the areas of probability and non-parametric statistics.
Krux, a Salesforce company, is a Data Management Platform (DMP) that helps its clients collect, manage, analyze and activate their people data. With a wide range of premium clients such as Kellogg, L’Oréal, Warner Brothers, New York Times, Washington Post, Uber, Spotify and many other household names, they see over 3.5 billion unique users globally a month, across sites, media, mobile app, transactional and offline traffic sources. That is more than Facebook, Wikipedia and Twitter combined. Processing this scale of data volume and velocity has presented many challenges over the seven years Krux has existed, and they had to develop various proprietary strategies and technologies to overcome those. In this session, Salesforce will share how Apache Spark, in particular, helped transform the DMP’s data processing infrastructure, using as an example the evolution of their "Look-alike" algorithm. Look-alike, a similarity-based classifier, is one of the most commonly used algorithms by marketers and publishers looking to extend their audience reach. Get a high-level introduction to the use case and algorithm, and learn about Salesforce's experience in moving the implementation from Hadoop to Spark and how it increased the performance, reliability and serviceability of the product. You will also hear about some of the technical challenges they faced, including large scale joins with skewed data, and how they solved those in Spark. Learn how Spark provides a wide range of high-level and low-level APIs that prove useful when implementing customized machine learning algorithms as compared with Hadoop, and how the overall abstraction makes it very easy to develop modular and easy to maintain code that is also performant. Session hashtag: #SFeco3