Adriana has been working as a Data Scientists with Data Insights for the past five years. Her role involves supporting customers with the development of ML models, and seeing through full production deployment.
November 17, 2020 04:00 PM PT
In this talk, we will present how we tied Python together with Databricks and MLflow to productionalize a machine learning pipeline.
Through the deployment of a fairly standard classification model, we will present what a machine learning pipeline in Production could look like. The project consists of two pipelines; training and prediction. We are using the S3 Bucket as a source of data. The training pipeline trains various models on data, registers them in Mlflow, and stores all metrics and hyperparameters. Using Grid Search, the best model is chosen and moved to the Production Stage in MLflow. The Production model can then be deployed using Flask, or just a UDF if we want to process data in a batch. The prediction pipeline will then use the deployed model to make a prediction, whether on-demand or in a batch.
The whole project is packaged as a library, which can be installed anywhere, and the pipelines can easily be configured through configuration files.
Speaker: Adriana Menegozzo