Yong Liu is a Principal Data Scientist at Outreach.io, working on machine learning and data science solution to solve problems arising from the sales engagement platform. Previously, he was with Maana Inc. and Microsoft. Prior to joining Microsoft, he was a Principal Investigator and Senior Research Scientist at the National Center for Supercomputing Applications (NCSA), where he led R&D projects funded by National Science Foundation and Microsoft Research. Yong holds a PhD from the University of Illinois at Urbana-Champaign.
"The advent of pre-trained language models such as Google’s BERT promises a high performance transfer learning (HPTL) paradigm for many natural language understanding tasks. One such task is email classification. Given the complexity of content and context of sales engagement, lack of standardized large corpus and benchmarks, limited labeled examples and heterogenous context of intent, this real-world use case poses both a challenge and an opportunity for adopting an HPTL approach. This talk presents an experimental investigation to evaluate transfer learning with pre-trained language models and embeddings for classifying sales engagement emails arising from digital sales engagement platforms (e.g., Outreach.io). We will present our findings on evaluating BERT, ELMo, Flair and GloVe embeddings with both feature-based and fine-tuning based transfer learning implementation strategies and their scalability on a GPU cluster with progressively increasing number of labeled samples. Databricks’ MLFlow was used to track hundreds of experiments with different parameters, metrics and models (tensorflow, pytorch etc.). While in this talk we focus on email classification task, the approach described is generic and can be used to evaluate applicability of HPTL to other machine learnings tasks. We hope our findings will help practitioners better understand capabilities and limitations of transfer learning and how to implement transfer learning at scale with Databricks for their scenarios."