Andrew Brooks

Senior Data Scientist,

Andrew is a Senior Data Scientist at where he focuses on developing and deploying NLP systems to provide intelligence and automation to sales workflows. Previously Andrew was a Data Scientist at Capital One working on speech recognition and NLP and Elder Research consulting in domains spanning government, fraud, housing, tech and film. Before discovering machine learning, Andrew was an aspiring Economist at the Federal Reserve Board forecasting macro trends in Emerging Markets. Andrew holds a MS in Mathematics and Statistics from Georgetown University and BS & BA in Economics and International Studies from American University.



Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS Sagemaker for Enterprise AI ScenariosSummit 2020

Transformer-based pretrained language models such as BERT, XLNet, Roberta and Albert significantly advance the state-of-the-art of NLP and open doors for solving practical business problems with high performance transfer learning. However, operationalizing these models with production-quality continuous integration/ delivery (CI/CD) end-to-end pipelines that cover the full machine learning life cycle stages of train, test, deploy and serve while managing associated data and code repositories is still a challenging task. In this presentation, we will demonstrate how we use MLflow and AWS Sagemaker to productionize deep transformer-based NLP models for guided sales engagement scenarios at the leading sales engagement platform,

We will share our experiences and lessons learned in the following areas:

  1. A publishing/consuming framework to effectively manage and coordinate data, models and artifacts (e.g., vocabulary file) at different machine learning stages
  2. A new MLflow model flavor that supports deep transformer models for logging and loading the models at different stages
  3. A design pattern to decouple model logic from deployment configurations and model customizations for a production scenario using MLProject entry points: train, test, wrap, deploy.
  4. A CI/CD pipeline that provides continuous integration and delivery of models into a Sagemaker endpoint to serve the production usage
We hope our experiences will be of great interest to a broad business community who are actively working on enterprise AI scenarios and digital transformation.

Andrew Brooks