Last summer, Databricks launched MLflow, an open source platform to manage the machine learning lifecycle, including experiment tracking, reproducible runs and model packaging. MLflow has grown quickly since then, with over 120 contributors from dozens of companies, including major contributions from R Studio and Microsoft. It has also gained new capabilities such as automatic logging from TensorFlow and Keras, Kubernetes integrations, and a high-level Java API. In this talk, we’ll cover some of the new features that have come to MLflow, and then focus on a major upcoming feature: model management with the MLflow Model Registry. Many organizations face challenges tracking which models are available in the organization and which ones are in production. The MLflow Model Registry provides a centralized database to keep track of these models, share and describe new model versions, and deploy the latest version of a model through APIs. We’ll demonstrate how these features can simplify common ML lifecycle tasks.
Matei Zaharia is an Assistant Professor of Computer Science at Stanford University and Chief Technologist at Databricks. He started the Apache Spark project during his PhD at UC Berkeley in 2009, and has worked broadly in datacenter systems, co-starting the Apache Mesos project and contributing as a committer on Apache Hadoop. Today, Matei tech-leads the MLflow development effort at Databricks. Matei’s research work was recognized through the 2014 ACM Doctoral Dissertation Award for the best PhD dissertation in computer science, an NSF CAREER Award and several best paper awards.