Leah McGuire is a Principal Member of Technical Staff at Salesforce Einstein, building platforms to enable the integration of machine learning into Salesforce products. Before joining Salesforce, Leah was a Senior Data Scientist on the data products team at LinkedIn working on personalization, entity resolution, and relevance for a variety of LinkedIn data products. She completed a PhD and a Postdoctoral Fellowship in Computational Neuroscience at the University of California, San Francisco, and at University of California, Berkeley, where she studied the neural encoding and integration of sensory signals.
For a machine learning application to be successful, it is not enough to give highly accurate predictions: Customers also want to know why the model has made that prediction, so they can compare it against their intuition and (hopefully) gain trust in the model. However, there is a trade-off between model accuracy and explainability - for example, the more complex your feature transformations become, the harder it is to explain what the resulting features mean to the end customer. However, with the right system design this doesn't mean it has to be a binary choice between these two goals. It is possible to combine complex, even automatic, feature engineering with highly accurate models and explanations. We will describe how we are using lineage tracing to solve this issue at Salesforce Einstein, allowing good model explanations to coexist with automatic feature engineering and model selection. By building this into an open source AutoML library TransmogrifAI, an extension to SparkMlLib, it is easy to ensure a consistent level of transparency in all of our ML applications. As model explanations are provided out of the box, data scientists don't need to re-invent the wheel when model explanations need to be surfaced.
Recommendation engines are proven drivers of sales. Competitions, such as the Netflix Prize and Kaggle, have driven a great deal of research on recommendations. However, most real world data sets don’t give you as much to work with. Many e-commerce datasets lack explicit ratings, consisting solely of binary purchase information. This means that for most of our data, we don’t know whether a missing value is actually missing or negative. Such data requires special consideration and treatment for both model selection and validation of results. In this talk I will describe implementation of a recommendation system for binary purchase data in Spark’s MLlib, compare fitting and prediction benchmarks for various models, and illustrate the performance differences across different scales of big data. Finally, I will share the lessons learned in how to efficiently select and implement the best recommendation model for your dataset.
Salesforce has created a machine learning framework on top of Spark ML that builds personalized models for businesses across a range of applications. Hear how expanding type information about features has allowed them to deal with custom datasets with good results. By building a platform that automatically does feature engineering on rich types (e.g. Currency and Percentages rather than Doubles; Phone Numbers and Email Addresses rather than Strings), they have automated much of the work that consumes most data scientists’ time. Learn how you can do the same by building a single model outline based on the application, and then having the framework customize it for each customer. Session hashtag: #SFds2
Leveraging your data to make better decisions is something every business wants to do, but doing it correctly involves weeks or months of work from highly skilled, hard to find individuals. However, many of the laborious steps a machine learning specialist follows in creating a custom application can be automated with enough compute power and flexibility. By leveraging Spark to do machine learning at scale, Salesforce Einstein has created a system that lets individuals with domain knowledge, but no machine learning expertise, create high quality, high impact machine learning applications.