Productionzing ML Model Using MLflow Model Serving

May 27, 2021 11:00 AM (PT)

Download Slides

Productionzing ML Models are needs to ensure model integrity while it efficiently replicate runtime environments across servers besides it keep track of how each of our models were created. It helps us better trace the root cause of changes and issues over time as we acquire new data and update our model. We have greater accountability over our models and the results they generate.

MLflow Model Serving delivers cost-effective and on-click deployment of model for real-time inferences. Also the Model Version deployed in the Model Serving can also be conveniently managed with MLflow Model Registry. We will going to cover following topics Deployment, Consumption and Monitoring. For deployment, we will demo the different version deployment and validate the deployment. For consumption, we demo connecting power bi and generate prediction report using ML Model deployed in MLflow serving. Lastly will wrap up with managing the MLflow serving like, access rights and monitoring capabilities.

In this session watch:
Nitin Raj Soundararajan, Senior Associate, Cognizant Worldwide Limited
Nagaraj Sengodan, Senior Technical Manager, HCL Technologies

 

Transcript

Nagaraj Sengoda…: Good morning and good afternoon to all. Thanks for joining this session. I think you have been come to interesting topics. So for the day, so we’re going to cover the topic of Productionzing and ML Model using MLflow Model Serving feature, just being newly introduced in the MLflow, maybe close to a year now. This is mainly for people who come from the ML Engineering background. Even, some data scientists always facing a challenge. So, building a model is one set of activity where we get into a lot of other challenges, like getting a parameter, tuning a parameter means engineering your features and find which is the right one to use and hyper-parameter tuning and choosing the right model and building a model and do the validation class for validation. So, there are many steps involved in building a model. So, after you spend enough time to build your model, and when you want to sell to end user, that’s a key because if you… the way how you’re going to sell your model, that’s important because that’s what people are going to consume and they get the experience, right?
When you want to expose your model through [inaudible], especially for the real time consumption, it’s always a challenge because you have to create your REST API for your parameter being passed with the security enabled and the user has to consume. And moreover, whenever you change your model, so you might use [inaudible] and you want to change it to PyTorch or you want to change it to TensorFlow. So, there are quite a bit of libraries going to change, and how are you going to keep in that annual induce user experience, that’s always challenge, right? So that’s where… Databricks MLflow come into picture, the new feature, so which are you going to cover in the session. Myself first Nagaraj, I’m helping building large enterprise-wide distributed system. I’ve been there in the state for close to 17 years now. I’m part of an architecture engineering team and helping deliver switching enterprise analytic systems for large and medium scale industries across different industry across different domains.

Nitin Raj Sound…: Thank you Nagar. Hi, this is Nitin Raj Soundararajan, technique, I’m working as a Technical Consultant and focusing on the advanced analytics, data engineering, cloud-scale analytics and data science working primarily on the data and the AI practice. So, today we are going to, so this is the agenda of what we are going to cover today. So first we are going to see on MLflow and then on the Mlflow serving, manage Served Versions, Monitor Served Models, Customize Serving Clusters and then finally covering the Q & A with the demo. So, MLflow. MLflow is an open source platform for the Machine Learning life cycle. And MLflow is built on open interface philosophy, defining several key abstractions that allow existing infrastructures and the Machine Learning algorithm to be integrated with the system easily. Meaning that if you’re a Data Scientist who wants to leverage the ML model and MLflow and you are using the particular framework, that’s currently unsupported, the open interface design makes it extremely easy to integrate that framework and start working with the platform, effectively.
This means that, MLflow is designed in principle to work with any Machine Learning library, any language for the MLflow facilitates and reproduce reproducibility, meaning the same training of production Machine Learning code is designed to execute the same results because regardless of the environment, whether in the cloud or in the local mission or in a notebook. Finally, MLflow is designed with the scalability in mind, meaning it’s just useful for the small team of a Data Scientist and as it’s for a large organization consist of, potentially thousands of Machine Learning practitioners. The MLflow components. So I’m going to walk through MLflow four components, Tracking, Projects, Model and then the Registry. So tracking is a centralized repository for moderator about and training session within the organization. Projects is a reproducible, self-contained packaging format, from the model training code and the showing about the training code around the same way, regardless of the execution environment.
MLflow model is a general purpose model format, enabling any model, a producer with MLflow to be deployed at a variety of production environment. And MLflow registry helps to solve three critical problems in the production, Machine Learning applications. It deploys from deploying the bad models by introducing the model administration and review. It integrates with the MLflow tracking to provide the complete picture of every model within your organization, including source code, parameters and metrics. It also provides a centralized activity logs, recording entire collaborative process from model development to deployment, complete model description and the commands.

Nagaraj Sengoda…: MLflow architecture, a high level overview or title, try to picture it in a diagram. If you look at it, the main core components, Mlflow frameworks fits is on the left side if you look at it, so one prepares enough data that we use as a data source and to some kind of a painting, formatting, Apache Spark, Delta Lake. Then we do building a model where you’re going to use a ML runtime or any other Machine Learning libraries on different environments to build up a model. We can use many features. So build our respective model, which gives a better results. So once we have the model, then come into picture like a deployment monitor with a scoring is a caper. So, that is where the [inaudible] came into the picture.
So how are you going to enable your model to sell for the end user? So there are two options. One is Online Serving other one is Batch mode, Batch and the Stream mode, so Batch and Stream Mode runs in the form of a system state, so that’s a pretty much we can handle it through the code state when coming to the online service.
So that’s where the critical, because we had to expose them as a service they [inaudible]. So the downstream application, they can consume an accuracy. So in this session, we’ll get a deep dive into the MLflow framework of which they recently launched model serving feature, which covers these three parts so, that is what we’re going to cover in this slide. So let’s say, what is MLflow models serving? So it exposes a REST API, whenever you build a model, whenever register a model in the project tracking, when you enable for the serving, it automatically creates a REST API by setting up the cluster behind the scene, you can configure the way how you want to do that. So are there any without any intervenes, like any developer or effort, so it automatically builds its own cluster and deploy your model with respect to your environment, create your REST API.
So using the REST API, which you can share it to the users to access it, to pass the value on, get the result. Right now, if you look at it, this has been released recently. So there are some limitations in being there, currently support up to 50 queries per second, like 15 queries per second time and with a 99.5% availability. So what data cluster accommodates, try to use it for the light load purpose, and for testing your model testing or serving purpose. Those kinds of options, you can use pretty much for the model serving if anything, under business critical. Let’s wait for some more time. Once the Mlflow serving is good, well matured, it will be enabled for the business critical applications too. If you look at it, how this works, so we have a different set of model. So if you look at it, currently MLflow supports TensorFlow, XGBoost, Keras, Scikit-Learn, Spark MLlib, ONNX, H20 and few other models as well.
So using those models, you can try build your model. So that’s one of the components within the MLflow, which will register like a different flavors. You can register within the MLflow model. So once you have the model, then you can enable the tracking. So tracking is nothing, but whenever you run your experiment, right? So you have positive parameter. So you have sustained dataset and you have seen the results, how it comes through, right? So, because whenever we want to do the reproducibility, so these are the important factors, because mostly of what happened, we can go back and see how long certainly the model being came through, right? So we’ve always had only the model. We not have the history, how the model been came through, right? So the tracking servers, one of the really good benefit, which you capture everything.
So you can add the metric, you can add the artifact, so you can add custom artifact as well so, that’s where the tracking server come into picture. The final one is a model registry. So this is one of the important, especially for the release management. Because you have the model and do you want to do it in a different stage, like a stage production. So you end up producing, you can do the AB testing as well. So you can have that two variant and you can share it to the users and then you can stop, which one is more active than you can integrate for the production. So you can customize it, the stages in the model registry. So it’s quite easy to handle it. You can throw the REST API, you can use a Python code, or you can use it your way, you can enable that too.
Once the model being available in the model registry, so how are you going to expose to the industry to consume, especially for the real-time use case, right, so that is where model serve come into picture. Certifiably, there are open sources and also there are some paid libraries, like seldom Core and KF serving, so there are quite a bit of codes available. So those are extensively optimized for the model serving for the scoring service. They call it is. If you want to do like a simple for your testing without much effort, then I would recommend try for the model serving, which is a more efficient with it comes, with the same, your Databrick environment whether you use AWS, GCP or Azure, it comes with a building interface where you can just enable a few clicks. You can set up your own cluster for your model to deploy and it runs. Typically, it takes all the environments.
So let’s say staging production outcome, except the outcome, what are the environment you have like a AB testing and then production so it takes all the [inaudible] and make it live with URL. So, the URL parameters with the production so you want whenever the version got changed, the URL to remain the same. So the user, who are going to consume it could seamlessly get the same experience. So that’s a good part, going for the model selling features. I’m trying to typically in a single diagram, like whatever the model you are going to build, which is different flavors of model, so you might use as a different model like Keras [inaudible], PyTorch [inaudible] Coffee too so whatever the model you take to build, and then you’d apply it to model registry.
So seamlessly, you can shop the model different versions and applying to the model registry, and then you can move across the stages like staging to A or B and then once you’ve decided which one is best fit for production, you can move the Cody A or B into production. Automatically, the model server will update your version and expose the REST API. So the REST API remains the same. So users never see any different systems. So, as soon as it’s being moved into production, the model serving update the model, behind the scene. And usually whenever you send the records, you’d get the new model outcome, so that’s a model serving. So probably it’s a good time for us to just get into the demo and see how it works in the railroads’ scenario.
In this demo, I’m going to walk you through the MLflow serving feature. For doing that, I’m using Iris classifier, which is a well-known example of just three different setups flavors. And how we are going to classify into three based on the sepal and petal, length and width parameters. Here I am using SKLearn framework, and the one I am using this as an empty classifier. Other one is a KN classifier and see from whenever I do the improvement in the model and how the impact in the user experience so, that’s what I’m going to explain from the ML serving perspective. So let me start with the decision tree. So if you look at the code, I’m just using the SKLearn. I’m just importing the decision tree classifier.
And this is my data. So I just have some sample data. I just put it into my notebook itself. And then, so I had the values. So based on the values, first set of values are, refers as a setosa, second, the middle stuff values are versicolor and the last one is the virginica. So these are three set of records I’ve been created. And I try to train my model. So before that, I just look at how the data being split. So if we look at it, the red colors are setosa, and the grays are virginica so you can see how the different datasets split across, right? So I’m just using the MLflow to start trying to capture, guess we can use so same MLflow as a project tracking. We track the complete versions that runs, so we can see how it performs and then that’s a need which I call model serving so, that’s very good to see.
So I’m going to run this quote all. So typically, once a MLflow capture this, then it is going to register the model. So once the model is registered, so I’m registering under Irish model. So once the model is the registered, I’m moving the model to production stage. So now the code has been completed successfully. So I’m going to show the model screen. I’m just going to the first model screen now. So here, you’re going to see there’s a new model called Iris model, which I just run the model. Yes. So you can see here, there wasn’t one, which I ran the quote, that is a latest version and post to production.
So I’m just going to the Iris model, on the accessible, so whatever the parameter tracks, so everything needs to be there. This is just a project tracking. That’s one of the feature. And you can see there’s a new feature called serving. So this is the easiest way you can enable them into a package your model for the user to consume through the REST API, especially for the Online Serving mode. So this is one of the nice feature I could say. So let’s say if I enable it, you can see it automatically set up the clusters and then it exposes the REST API for user to consume it, without any need of a development effort so, that’s a really good thing. And it is also going to capture the events like what are the events happen in the model so those also will be captured and you have your URL.
So this is your REST API URL. And if you want to invoke through browser, just to want to test it, you can test it from here, or you can use SKLearn or Python. So thing is you need to have the Databricks token, it is for the security. So it’s a Databrick token. You usually can take from your Databricks. You can create your own token, set your expected date, and get your token and ask for the token here. So that’s for addressing the request to get the data from the model. And you pass your URL.
So either you can use one or the production. So my recommendation always is to use the production because it is part of the versions you would apply into your tracking server. If you say production, what are the versions available in the production, you can always pick the latest version of the production. So that way, from the industry perspective, there is no need to change any code at all. So all they need to do is, just to call this REST API, begin the scene automatically, find the version available in the production. So it will make the respecting model and send the request and get this back to the end-users.
And you can see now my cluster is ready. Now what MLflow does here, is, it tried to package my model and apply my model into the cluster and create the ADP for my cluster, which are sent to this one. So it has an input and output. So input is the parameters I’m going to pass like a sepal and petal, length and width and output is my value whether what kind of Iris flavor it is. Okay. Now you can see both our time to green. It means now my model is readily available in the cluster for consumption. So anyone can request ADP request and get the results. So let me pick the sample data of which I can show you.
Okay. So here you can see, I have two set of datas. So first I have value one, and then I have second. Also, I’m going to pass to set of records and see how the value it can predict from the version one, which I use in this entry. So I just pass them random values for the sepal and petal, length and width. I just click the Send Request, and you can see both are consists of a vertical Iris flavors.
Behind the scene, if you scroll down and you could look at the last, what one of the things happen in the beginning is in how the model being applied. So all the ones logs, everything been captured and version wise, is there any version being applied, any change happened, let’s say if new version being applied, so those will be captured here. So let’s say if a new version being applied with the more accuracy I can get that as well. So now you can see these values are more, seems to be like a vertigo so, that’s what this model predict. If you look at the model, if it go here and you can see the accuracy is a point six, so that’s the accuracy for this decision tree model.
So this is quite easy, so I can easily enable, I build a model, I can easily enable it for the user to consume without much of work from ML Engineering perspective as well Data Scientist perspective. This makes it more easier for me to do that. Let’s say, if you want to change your clusters, say, so you here, your options. So there’s a cluster setting. Here you can see there are different set of clusters available. So you can say what kind of a VM you need, like a cluster you need complete memory optimized or compute optimized, storage optimized, depends on your need. You can set the respective cluster and you can save the particular cluster type. So whenever you want to sell your model automatically the cluster it, pick whatever the setting you’ve made it in the screen, so that’s quite an easy one I could say. So anyone can easily set up the cluster and then pretty much runs well so that’s about the Mlflow serving feature. So now what I’m going to do is my colleague are to run with the next version and see how the transition happen. So power to you.

Nitin Raj Sound…: Now, we are going to improve the accuracy of the model, which was used earlier with the class of decision tree algorithm, which has given us the 60% of the accuracy. Now we are going to improve the accuracy of the model by using the KN algorithm. So, in this particular notebook, we have used the KNeighbor classifier from the SKLearn library, and we have used the same sort of code and sample data to train the model as well. So now we are going to run this particular notebook and see the improvement in the accuracy. So I’m just clicking on the running, run all.
We can see that there is an implement of 94% agent in the accuracy of the model as we are using the KN algorithm. The notebook has got completed. And now let us go into the model and see the serving. So we can see that in the Iris model, the version two has got deployed, which we ran recently now, just now we have run it and we can see that our version two is now in production. So I’m going into the serving. And so then the serving, we can see that version two is still pending. So it will take another couple of minutes to move to the ready state.
Yeah. Now we can see that the version two has gotten more into the ready state. Now we are good to go in our testing of giving the sample input in the request box so that we can see what our desired response. So let us first give the sample input for the version one. And let me click on Send Request. What we can see is that, so both are virginica, in the version one for the given input. And then now let me, I use the same input in the version two that we have recently with the KN model, and I’m using the same set of input in the request box. And then I’m just giving the end request. Here, we can see that one come out too, which is nothing but versicolor and virginica. So this is because of the improvement in the model two, 94 percentage. So that is a key point over here, what we’re seeing here.
And so whenever, even though we have run the two different models, there is a no change in the consumption link, in the request URL, the request URL will be always in the production. There is a no change in the versions or model names that we need to specifically given the request URL. So we are going to always point towards the production URL, which will give us the recent versions that is pushed to production right now. So, and we can, in the log section, we can see that the conduct activity model two, so the model two is posted to the production.
As I have seen in the demo. So we are going to see the Manage server versions. So all active, non-active, the model versions are deployed, and you can query them using the URLs. Actually, Databricks automatically deploy the new model versions when they are registered and automatically removes the old versions when they are archived and managing the model access rights. Model access rights are inherited from the model registry, enabling or disabling the serving picture, likewise managed permissions on the registered model. Anyone with their arranged rights can score any of the deployed versions, scoring the deployed model versions. To score a deployed model, you can use the UI or send the REST API request to the model you are in source via UI. So this is the easiest and the fastest way to test the model. You can insert the model input beta in a different format, and then you can send the request to the service.
If, the model has been logged with an input example, so then we can be able to see the corresponding output and scoring away at the REST API request. You can send us coding request through the rest API using standard Databricks authentication. Moving onto the Monitor server models. The server page displays the status indicator for the serving cluster, as well as the individual model versions. In addition, you can use the following and to obtain the further information. To inspect the state of the serving cluster, use a model event tags, which displays a list of all the serving events for this model. To inspect the status of a single model version, user logs, or a version event tags on the model version tab. Customized serving cluster, to customize the serving cluster, use a cluster setting tags in the saving tag. To modify the memory size and the number of course of a serving cluster, use the instance type dropdown menu to select the desired cluster configurations. You can add the tags for the registered models to refer later in the other places, and then you can also edit or delete the existing tags.
So that’s all from the model serving perspective, and please raise your questions and so that we can, we will answer your questions.

Nitin Raj Soundararajan

Nitin Raj Soundararajan is a technical architect focusing on advanced data analytics, data engineering, cloud scale analytics and data science to solve real business problems in multiple domains. He ...
Read more

Nagaraj Sengodan

Nagaraj Sengodan is a Senior Technical Manager in Data and Analytics Practice at HCL Technologies, where he brings over 15 years of industry experience in data engineering and analytics. He has arc...
Read more