Alexis Roos - Databricks

Alexis Roos

Director, Data Science and Machine Learning, Salesforce

Alexis is director of data science and machine learning at salesforce where he is leading a team of data scientists and engineers delivering Intelligent services for Einstein platform. Alexis has over 20 years of engineering and management experience with the last 6 years focused on large scale (10s of TBs of data and billion records) data science & engineering including data preparation, entity resolution, distributed graph processing, machine learning, NLP and deep learning. Alexis started coding as a teen, became an avid 68000 programmer and pursued a MS in CS & Cognitive Sciences when AI was about expert systems.

UPCOMING SESSIONS

Deep Learning for Natural Language Processing Using Apache Spark and TensorFlowSummit 2018

When interacting with customers, being able to extract relevant communications information in real-time is critical for success. This presentation will illustrate how Salesforce is using Apache Spark and TensorFlow to monitor customer activities in real-time and surface insights. Long Short-Term Memory (LSTM) networks have proven to be an effective technology to achieve state-of-the-art results on a variety of Natural Language Processing (NLP) tasks. It naturally captures the temporal information and the semantic meanings of human language, when coupled with word embedding models. LSTM networks can be readily built using any of today's deep learning packages. However, most popular deep learning packages use Python as their native language, which presents a real challenge in productizing such technology, with production environments often relying on other technology stack. In this talk, we will explain how to build an LSTM classifier using the TensorFlow framework, and combine the deep learning apparatus of TensorFlow with the distributed data processing power of Spark. We will discuss how to reuse existing Scala data preparation libraries in TensorFlow training pipeline and unify them into a single Notebook and discuss strategies for scoring at runtime. We will show that pre-trained Word2Vec embeddings reduce the demand for large volume of labeled data. The end result is a fast and accurate machine learning model for text classification that can be integrated into a structured streaming production environment. Session hashtag: #DLSAIS12

PAST SESSIONS

Building a GraphSummit East 2016

Radius Intelligence (www.radius.com) empowers Data Science to deliver an unique marketing intelligence platform used by over hundred US companies. This presentation will explain how Radius is using Spark along with GraphX, MLLib and Scala to create a comprehensive and accurate index of US business from dozens of different sources. In particular, I will address problems related to clustering records together based on a graph approach and how to resolve the graph into a set of US businesses. I will discuss some of the models related to cleaning out the noise and how to rank best values and impute missing values and provide some best practices.

Using Apache Spark for Intelligent ServicesSummit East 2017

Salesforce is developing Einstein which is an artificial intelligence (AI) capability built into the core of the Salesforce Platform. Einstein helps power the world’s smartest CRM to deliver advanced AI capabilities to sales, services, and marketing teams - helping them discover new insights, predict likely outcomes to power smarter decision making, recommend next steps, and automate workflows so users can focus on building meaningful relationships with every customer. Salesforce is using Apache Spark (batch, streaming, GraphX and ML) to power the Einstein platform and services. In this keynote and demo, Alexis will highlight how Salesforce is building intelligent Services for Einstein using activity data by leveraging Spark and Databricks to scale data science and engineering.

Using AI for Providing Insights and Recommendations on Activity DataSummit 2017

In the customer age, being able to extract relevant communications information in real-time and cross reference it with context is key. Learn how Salesforce Inbox is using data science and engineering to enable salespeople to monitor their emails in real-time and surface insights and recommendations. Salesforce is developing Einstein, an artificial intelligence capability built into the core of the Salesforce Platform. Einstein helps power the world's smartest CRM to deliver advanced AI capabilities to sales, services, and marketing teams – allowing them to discover new insights, predict likely outcomes to power smarter decision making, recommend next steps, and automate workflows so users can focus on building meaningful relationships with every customer. Find out how Salesforce Einstein Inbox combines activity data, such as emails, with contextual and CRM data to provide real-time insights and recommended actions. Learn about use cases, architecture, and how a variety of technologies including data engineering, data science, graph processing, NLP, machine learning and deep learning are combined together to support the application. This session will include an interactive demo where you'll get to see the associated code using notebooks running Spark. Session hashtag: #SFds6