Leah McGuire is a Principal Member of Technical Staff at Salesforce Einstein, building platforms to enable the integration of machine learning into Salesforce products. Before joining Salesforce, Leah was a Senior Data Scientist on the data products team at LinkedIn working on personalization, entity resolution, and relevance for a variety of LinkedIn data products. She completed a PhD and a Postdoctoral Fellowship in Computational Neuroscience at the University of California, San Francisco, and at University of California, Berkeley, where she studied the neural encoding and integration of sensory signals.
Recommendation engines are proven drivers of sales. Competitions, such as the Netflix Prize and Kaggle, have driven a great deal of research on recommendations. However, most real world data sets don’t give you as much to work with. Many e-commerce datasets lack explicit ratings, consisting solely of binary purchase information. This means that for most of our data, we don’t know whether a missing value is actually missing or negative. Such data requires special consideration and treatment for both model selection and validation of results. In this talk I will describe implementation of a recommendation system for binary purchase data in Spark’s MLlib, compare fitting and prediction benchmarks for various models, and illustrate the performance differences across different scales of big data. Finally, I will share the lessons learned in how to efficiently select and implement the best recommendation model for your dataset.
Salesforce has created a machine learning framework on top of Spark ML that builds personalized models for businesses across a range of applications. Hear how expanding type information about features has allowed them to deal with custom datasets with good results. By building a platform that automatically does feature engineering on rich types (e.g. Currency and Percentages rather than Doubles; Phone Numbers and Email Addresses rather than Strings), they have automated much of the work that consumes most data scientists’ time. Learn how you can do the same by building a single model outline based on the application, and then having the framework customize it for each customer. Session hashtag: #SFds2
Leveraging your data to make better decisions is something every business wants to do, but doing it correctly involves weeks or months of work from highly skilled, hard to find individuals. However, many of the laborious steps a machine learning specialist follows in creating a custom application can be automated with enough compute power and flexibility. By leveraging Spark to do machine learning at scale, Salesforce Einstein has created a system that lets individuals with domain knowledge, but no machine learning expertise, create high quality, high impact machine learning applications.