Building a machine learning model is an iterative process. A data scientist will build many tens to hundreds of models before arriving at one that meets some acceptance criteria. However, the current style of model building is ad-hoc and there is no practical way for a data scientist to manage models that are built over time. In addition, there are no means to run complex queries on models and related data.
In this talk, we present ModelDB, a novel end-to-end system for managing machine learning (ML) models. Using client libraries, ModelDB automatically tracks and versions ML models in their native environments (e.g. spark.ml, scikit-learn). A common set of abstractions enable ModelDB to capture models and pipelines built across different languages and environments. The structured representation of models and metadata then provides a platform for users to issue complex queries across various modeling artifacts. Our rich web frontend provides a way to query ModelDB at varying levels of granularity.
ModelDB has been open-sourced at https://github.com/mitdbg/modeldb.
Manasi Vartak is the founder and CEO of Verta, an MIT spinoff building an open-core MLOps platform for the full ML lifecycle. Verta grew out of Manasi's Ph.D. work at MIT on ModelDB, the first open-source model management system deployed at Fortune 500 companies. The Verta MLOps platform enables data scientists and ML engineers to robustly take trained ML models through the end-to-end MLOps cycle including versioning, packaging, release, operations, and monitoring. Previously, Manasi worked on feed ranking at Twitter and dynamic ad-targeting at Google. Manasi has spoken at several top research as well as industrial conferences such as the Strata O’Reilly Conference, SIGMOD, VLDB, Spark Summit, and AnacondaCON, and has authored a course on model management.