Richard Zang - Databricks

Richard Zang

Software Engineer, Databricks

Richard Zang is a software engineer on the ML Platform team at Databricks. Richard has great interest and extensive experience building data-intensive enterprise applications. Before Databricks he worked at Hortonwork on Apache Ambari and prior to that he worked at Opentext Analytics building its BI visualization suite. Richard holds an MS in Computer Science from the University of Chicago and BE in Software Engineering from Sun Yat-Sen University.


Building Multistep Training & Deployment Workflows with MLflowSummit 2020

Many organizations using machine learning are facing challenges maintaining their complex yet fragile training pipelines as well as managing a large number of models generated from those pipelines. To simplify the process, organizations tend to build custom 'ML platforms' to glue together their discrete machine learning steps. However, even such platforms are limited to a few supported algorithms and strong coupling with each company's internal infrastructure. MLflow, a new open-source project from Databricks is designed to standardize and unify this process.

In this talk, I'll give an overview of MLflow, followed by a detailed introduction to two new components recently developed in MLflow: MLflow Multi Step Workflow & MLflow Model Registry. MLflow Multi Step Workflow provides a set of APIs and UI that can be used to create a training workflow by defining a set of training steps and dependencies among those steps. It allows each step to use different languages and frameworks as well as being configured to automatically reuse training results cached by previous workflows. Existing workflows can also be edited and rerun with new parameters to maximize the reuse of previous training results. MLflow Model Registry provides a suite of APIs and intuitive UI for organizations to register and share new models as well as perform lifecycle management on their existing models. MLflow Model Registry is seamlessly integrated with the existing MLflow tracking component, allowing it to be used to trace back the original run where the model artifacts were logged as well as the source code of that run, giving complete lineage of the lifecycle for all models. Finally, MLflow Model Registry can be integrated with existing ML pipelines to deploy the latest version of a model to production. A live demo will be provided to show how these new MLflow features simplify model training and model version management.