Chengyin Eng is a data science consultant at Databricks, where she implements data science solutions and delivers machine learning training to cross-functional clients. She received her M.S. in Computer Science from the University of Massachusetts, Amherst. She completed her B.A. in Environmental Studies and Statistics at Mount Holyoke College and spent her college years applying statistical modeling techniques to forest research. Thereafter, she worked in the life insurance industry and provided pro-bono data science services to NGOs. Outside data science, Chengyin enjoys reading and visiting outdoor markets.
May 27, 2021 11:35 AM PT
Deploying machine learning models has become a relatively frictionless process. However, properly deploying a model with a robust testing and monitoring framework is a vastly more complex task. There is no one-size-fits-all solution when it comes to productionizing ML models, oftentimes requiring custom implementations utilising multiple libraries and tools. There are however, a set of core statistical tests and metrics one should have in place to detect phenomena such as data and concept drift to prevent models from becoming unknowingly stale and detrimental to the business.
Combining our experiences from working with Databricks customers, we do a deep dive on how to test your ML models in production using open source tools such as MLflow, SciPy and statsmodels. You will come away from this talk armed with knowledge of the key tenets for testing both model and data validity in production, along with a generalizable demo which uses MLflow to assist with the reproducibility of this process.