Yong Liu - Databricks

Yong Liu

Principal Data Scientist, Outreach.io

Yong Liu is a Principal Data Scientist at Outreach.io, working on machine learning and data science solution to solve problems arising from the sales engagement platform. Previously, he was with Maana Inc. and Microsoft. Prior to joining Microsoft, he was a Principal Investigator and Senior Research Scientist at the National Center for Supercomputing Applications (NCSA), where he led R&D projects funded by National Science Foundation and Microsoft Research. Yong holds a PhD from the University of Illinois at Urbana-Champaign.

UPCOMING SESSIONS

Continuous Delivery of Deep Transformer-based NLP Models Using MLflow and AWS Sagemaker for Enterprise AI ScenariosSummit 2020

Transformer-based pretrained language models such as BERT, XLNet, Roberta and Albert significantly advance the state-of-the-art of NLP and open doors for solving practical business problems with high performance transfer learning. However, operationalizing these models with production-quality continuous integration/ delivery (CI/CD) end-to-end pipelines that cover the full machine learning life cycle stages of train, test, deploy and serve while managing associated data and code repositories is still a challenging task. In this presentation, we will demonstrate how we use MLflow and AWS Sagemaker to productionize deep transformer-based NLP models for guided sales engagement scenarios at the leading sales engagement platform, Outreach.io.

We will share our experiences and lessons learned in the following areas:

  1. A publishing/consuming framework to effectively manage and coordinate data, models and artifacts (e.g., vocabulary file) at different machine learning stages
  2. A new MLflow model flavor that supports deep transformer models for logging and loading the models at different stages
  3. A design pattern to decouple model logic from deployment configurations and model customizations for a production scenario using MLProject entry points: train, test, wrap, deploy.
  4. A CI/CD pipeline that provides continuous integration and delivery of models into a Sagemaker endpoint to serve the production usage
We hope our experiences will be of great interest to a broad business community who are actively working on enterprise AI scenarios and digital transformation.

PAST SESSIONS

High Performance Transfer Learning for Classifying Intent of Sales Engagement Emails: An Experimental StudySummit 2019

The advent of pre-trained language models such as Google’s BERT promises a high performance transfer learning (HPTL) paradigm for many natural language understanding tasks. One such task is email classification. Given the complexity of content and context of sales engagement, lack of standardized large corpus and benchmarks, limited labeled examples and heterogenous context of intent, this real-world use case poses both a challenge and an opportunity for adopting an HPTL approach. This talk presents an experimental investigation to evaluate transfer learning with pre-trained language models and embeddings for classifying sales engagement emails arising from digital sales engagement platforms (e.g., Outreach.io). We will present our findings on evaluating BERT, ELMo, Flair and GloVe embeddings with both feature-based and fine-tuning based transfer learning implementation strategies and their scalability on a GPU cluster with progressively increasing number of labeled samples. Databricks’ MLFlow was used to track hundreds of experiments with different parameters, metrics and models (tensorflow, pytorch etc.). While in this talk we focus on email classification task, the approach described is generic and can be used to evaluate applicability of HPTL to other machine learnings tasks. We hope our findings will help practitioners better understand capabilities and limitations of transfer learning and how to implement transfer learning at scale with Databricks for their scenarios.