Hear key learnings from building AI platforms for deploying ML models to production at scale. This includes a fully automated continuous delivery process and systematic measures to minimize the cost and effort required to sustain the models in production. The talk includes examples from different business domains and deployment scenarios covering their architecture which is based, in most cases, on streaming, microservice architecture optimized to be easily deployed with Docker containers and Kubernetes. This method offers a good separation of concerns as Data scientists don’t have to care about engineering aspects which are not part of their expertise. A data scientist can just push the model, which is code, while complying to some standards – and the rest will happen automagically. This code will be built, tested, deployed and activated in an AI platform that already has all the integration hooks to the biz domain. In addition, it offers cool manageability aspects that help track and maintain the model in production and reduce its total cost of ownership. This includes features such as applicative monitoring, model health indicators, re-train of models and more.
– Hi, I’m Moty Fania and I’m the CTO of the organization at Intel IT, and I would like to share how we enable Push Button productization of AI models in Intel. I would like to start with explaining what AI means to us, and from our perspective, it’s all about being able to mimic a human decision with an algorithm, a decision that requires judgment, obviously, not just a simple automation, but once you are able to make an intelligent decision using an algorithm, it’s highly powerful, because then you can solve many bottlenecks that are related to scaling.
You can scale algorithm, obviously, without any limitation, and you can, by that, achieve many things where you don’t have enough headcount to do. Today, we don’t have the ability to have algorithms that solve problems in a very broad domain, or in many domains, so obviously we’re talking about algorithms that are very narrow and this is why it’s called Weak AI. But still, this is very powerful because you can have many of such narrow algorithms, or narrow AI algorithms, and by that, dealing or solving a very broad spectrum of problems. A bit about the organization that I belong to as a context to what I’m going to present.
So I’m, the organization that I am part of is actually called Advanced Analytics. It works across Intel in order to transform critical work using AI. And it means that systematically, these verticals, we are structured as verticals, are working with this specific business domain from the initiation of an idea until the deployment to production and the sustainer. And each of these verticals have a whole lot of skills that are required. Data scientists, machine learning engineers, product analysts, in order to do what they need. So I just go very briefly through them, so one of them is related to optimizing part performance in Intel’s product chips. So actually algorithms that work within the CPU or in the very close surrounding, in the drivers, in order to optimize this balance. Hardware evaluation is an example where we don’t have enough validation engineers in order to validate the designs and they are becoming more and more complex, and it’s very hard to track these bugs that are even in a very rare combination of events. So being able to do that in an efficient way is something that is important, and we are using AI in order to do that. Another vertical is dealing with what happens once a unit was manufactured and it’s all about how we intelligently apply decisions like which you need to test, how to pack it, how to fuse it, and things that in the past, we treated all of the units as if they are the same, and now we can use the history or the trail of each unit in order to make much more personalized decisions that could be very beneficial in many aspects. Industrial AI manufacturing is all about improving the yield during the manufacturing process. In sales, we are attempting to use the capabilities of AI to help the existing sales workforce, but since we don’t have enough people, we are also using it in order to create or generate autonomous cells, meaning many accounts that are being covered using AI. And I will say more about it later on as an example. And Mobileye, which is all about harnessing the data that the cars in the road see, there are many camera to collect a lot of data, and we can benefit from that, obviously. So this was the background, and now let’s go and consider the life cycle of a AI.
So obviously it means that you first have to generate a model, right? This is the first step, but this is not enough, I mean, you want to do it as fast as possible, but having a modal is not enough. Once you have a model, you would like to make sure that you can productize it as soon as possible. And this means you really want to do it with minimal delay minutes, like any software, I mean, the software where you apply continuous delivery, there’s no reason why we should treat models differently. Once you deploy model to production, and now it’s active there and generating value, you are not done, obviously, because then you need, we know for fact that models degrade over time, and if you don’t track them and sustain them in production, there will be a point where there will be providing wrong results and even create damage. So, the challenge here is to make sure that we are able to do this sustain in a very efficient way so we can still focus most of our workforce, most of our data scientists and machine engineers on the new projects, the new problems to solve and not dealing with hundreds of solutions that we deployed in the past. And actually we are able to do it in a very nice way today, we’re talking about less than five percent of the effort investing in sustain, which is a good number, this is the number that we are trying to achieve and maintain over time. So what are our goals, what are our objectives when we are talking about AI.
So I mentioned already, the aspects of continuous delivery, but let’s talk about other aspects. So one other thing that we want to ensure is that in the process of producing the model, our data scientists are benefiting from the ability to do proper experiment tracking and more report it when needed to be able to fully reproduce their process, so reproducibility and tracking is one thing. Another aspect, we want to minimize the effort that is required once a model is there and the data scientists completed it, traditionally the work will be handed to another person, to the machine learning engineer in order to do some work to productize it. And we want to minimize these things to a minimum, to zero, I mean, if the data scientists could potentially be responsible for taking the model that he generated to production, this will be optimal. And I will say what we said to him, well, what we have done to enable it. I mentioned CI/CD models are code, and there’s no reason to treat them different than other codes. So we should be able to do that. We wanted to enable a scalable and flexible inference system. This is the production, actually, this is the real production, and you want to make sure that it is felt all around this caliber and all of these things. And we also wanted to make sure that we have a solution for that on top of Kubernetes, and last, but definitely not least, is about the sustainability of the modern production, how do you make sure that you can sustain them and make sure that their quality is good and with minimal burden for your people? So let’s consider one of the ways to productize models and the common one, obviously there are many tools out there, a lot of open source, a lot of technologies with which you could easily build any application or AI solution.
So what we are seeing is that the common approach for productizing model is actually to take this model once it’s ready, and to wrap it with some kind of an application or a process that takes care about bringing the data that this model needs to the model, maybe preparing it a bit, and then once the model is executed to expose the results of the model in a way, it could be a web surveys, it could be just sending a file or something, but you build such a wrapper application or frame around the model and you are done. It could take two weeks, three weeks to develop such application and maybe one week to test it, so in few weeks you can have it ready. However, this is not a good approach, and it’s not very good for many reasons.
First, it’s not easy. If we are talking about few weeks to deploy a model, three or four weeks, it’s a lot. I mean, what we want to have minutes, we want to have it immediately. Why wait for weeks in order to deploy a model? Second thing, there’s very little reuse with this approach. I mean, we are talking about a situation where each application that we have developed is doing the same processes of bringing data, maybe it’s the same data that used by multiple models, but we will still doing the same thing again and again. So very little reusing the code in the models, in everything that surrounds the model. There’s no real separation of concerns between the data scientists who develop the model, who may be the one that is also developing a duplication, which is not the right thing, because this is not a software developer or vice versa. I mean, if a developer takes care of it, still, you have a lot of, go back and forth between them. And the last thing, which talks about the fact that models degrade over time. And if you don’t do much, you will get a situation where the models are creating damage in production, and you are getting escalation all over, and it’s very difficult to track so many applications that are running in your environment, if this is the way you are productizing your model. So this is not the way we are promoting and we are doing internally. What we do is actually to implement the concept of a AI platform and what AI platform means is listed here. I mean, we’re talking about a platform that was designed first to make it easy to deploy models to production. This is first, I mean, you want to be able to quickly, rapidly, deploy models. So their platform is actually the one that is hosting the model and serving them. One thing. The second thing is about making sure that all of the data that is relevant for the model, or many models that are relevant for this subject domain are being brought by their platform. So the platform is the one that takes care about bringing all of the data that could be used by hundreds of models, but the platform is taking care of it. And this may sound trivial, but in some cases we are talking about the complex process, it will be involved crawling the web, or collecting many logs and parsing them and applying many, many activities on the data. So this should be taken by care by the platform and not by the model. The third aspect is about the other side, I mean, you want to integrate the model into the business process, so the platform takes care about it, the platform has all the integration hooks into the business process and takes care about it for many models. So this tells us the problem from the model being able to concern about how to integrate the results into the process. And the last thing that I mentioned, the sustainability aspect, retraining the model, collecting the indicators, collecting the logs, all of these things are a part of the responsibility of the AI platform aside to find it. And it means that if I generalize it, the AI platform allows us to make it easy to deploy models and sustain in production for very long periods, and we have models that are running more than 10 years in production in such platforms. So I mentioned before that we have dedicated AI platforms in different business domains, and the main reason is that we started by though with one that we thought will serve many domains. However, we found that there are different aspects in each platform, the way which data is required, how you bring it to the platform, how you integrate back into the business, this is very business-specific, so we found that they can share many aspects, the platforms, but it’s better to have one per domain. And this allows us to get this deep integration to the business domain from one hand, by the way, all of our platform are based on open source. They can be deployed anywhere, on-prem, cloud, hybrid, we have all of them and they all enable this continuous delivery capabilities.
But I do want to emphasize that there are still reusing some common reusable assets, so they don’t have to reinvent the wheel for the aspects that are common. So it could be that for the aspect of bringing the data or integrating into the business domain, they have to be very specific, but aspects like continuous delivery aspects, like the one that I mentioned in the slide before, could be common.
So this is exactly what we have done. These objectives that I’ve mentioned before were actually implemented in a system that implements all of them in a way that is consumable, it’s a product that each of these AI platform can consume, and it enables all of these things, I mean, the ability to enable from experiment tracking until deployment, with CI/CD and the sustaining production, all of these aspects were implemented, this is a product, a modular product that the platforms actually are able to consume. We call it, internal name, My Corrupter, and it doesn’t matter, but I want to go through some of the aspects in the product. and again, this was reused in all of our platform.
So let’s start with this continuous delivery notion that I mentioned. What we have done here is basically to define a set of standards that this product enforces, and once the DS complies with that, then he has a model. Then once the model is ready, all he or she have to do is to push the code into the source control and to register it in the technology that we use. I mean, in our case, we chose MLflow. And from that thing the magic happens. I mean, all of the tests are being run like any CI/CD process, the system, obviously all of the ecosystem of this model, I mean, the old system where it has to run in is being built, deployed in containers, deployed into the right Kubernetes cluster if it’s Kubernetes. And then if everything was successful, it gets to become a release candidate. It’s ready on the shelf for a decision to productize it to production. So this is just another click of a button, it could be a business decision went to go to live, to go to production, it could be other considerations, but the distance from a release candidate to being deployed on the AI platform is another impressive about them. So this is very powerful, the ability to get to a situation where deployment of models is no different from the deployment of any software that we do. So I mentioned MLflow, and I do want to say a bit more about it. The reason why we chose MLflow is because, the first reason is that it supports many, many frameworks, and this gives a lot of flexibility. It also covers many of the aspects we use, mostly the experiment tracking, and the reproducibility aspect that we have added. And also the registration, the Model Registry that was added there not long ago. And these are the main aspects that we use. But what we have done on top of it is to enable easy deployment of instances of MLflow and tying them to the cloud.
So instead of running the experiments only on my computer, when they use our product, it uses MLflow and stores all of the artifacts actually to the internal cloud, to an object store based on minIO. And so all of the artifacts, all of the high-power matters, everything that is being stored, they are stored in a safe place where that can be shared, and this is another big advantage, it can be shared among many people, instead of just being in my computer, no-one knows what progress I’ve made, then no-one knows how good or bad I’m doing.
So this is one thing, the second aspect, we wanted to have a solid system for inference in production, in our AI platforms, that the most of them, by the way, are based on Kubernetes. And we wanted a serving mechanism that can work in a very reliable way and flexible way on top of Kubernetes in our AI platforms. And we chose Seldon Core. Again, the main reason is the flexibility, it can run or use almost any framework. It can run anywhere once you have Kubernetes, and the most important, it also support the notion of graphs. And this is very, very powerful. The ability to develop and deploy easily and then assemble of graphs and several models or a flow of models that will generate the right results is in many cases needed, required, and Seldon makes it easy. But what we have done on top of Seldon, we know that our data scientists don’t like to use Kubernetes in many cases, they don’t know Kubernetes, they don’t know what is a Helm chart, they don’t know, even in some cases to use containers properly. So we didn’t want them to deal with the aspects of Kubernetes, so we have obstructed this for them. So they only define the graph in configuration, in a simple YAML, and provide all of the code for the logic, and this is automatically translated into the right, they’ll do the Helm chart that can be the point of production. So this way we still give a lot flexibility to data scientists to do what they can do, focus on the model and the configuration, and the rest happens for them using the system that we have built. Once deployed to production, we talked about the aspect of monitoring the models, and this is another very important aspect of reusable component that we want to have in all of our platforms, and this is why we developed this reusable.
So this involves few things, first, the ability to easily deploy indicators for the models that you have. And we wanted to give a lot of flexibility for the data scientists to be able to do it by themselves. But on top of that, we identified that there are many indicators or metrics that are very common between many projects. So instead of having each one of the projects redoing it again and again, we have put into our subsystem many out-of-the-box indicators that are ready to be used and consumed. Things like the ability to identify concept drift, you compare the distribution of your features and compare them in there, compared to what happened in the training dataset, or even in production over time, you apply change detection to see why there is the big change there. Similarly, you can do the same for the inference result and to see if there is a stability there or not, you can do it for the graph as a combination. If you have the ability to bring labels back, there are many, many calculations and metrics that can be calculated in such case, we make it easy to bring the labels into the system and to calculate all of these just by consuming their reasonable capabilities. So this is an important capability that is, as I said, reusable, and can be consumed and deployed in all of our AI platform.
The way we designed it is basically it’s quite, we use Elasticsearch. Once you have a logger for your production system that can log the inputs and output for the model, it begins there. So we are not coupled to Seldom or any technology that is foreign inference, it can work with any inference technology. So once you have a logger that captures the roll data that you need, then we have another layer that is calculated automatically to calculate the different kinds of aggregations, based on configuration, we use the transformer capability in Elastic to do this aggregation online. We enable the ability to bring more data resources if you need them. For example, if you want to know what was their distribution of the data when you train the model, so this is something that you can bring and then you can use within Elastic. And the last layer is about, or the labels, as I mentioned before, the ability to easily bring the labels if you have them, the real results of the prediction, so you can then calculate the metrics. And the last layer is the being able to write Python code on top of all of these resources, to either define a metric or an indicator that you want to calculate and write it back to Elastic. So all of this is available through two main tools, Kibana and ElastAlert. Kibana is visualization, ElastAlert allows you basically to define rules and to achieve the actuation on top of the indicators. And this is basically what we have done here, this is a very powerful capability if you have it deployed because it makes you much, much more proactive than what you have been if you wouldn’t have done it. So now I want to go back to the AI platforms and give just a taste of how an AI platform looks like, so it won’t be abstract, and I want to choose the Sales platform as an example. So when we talk about sales, again, as I said, we don’t have enough sales people to cover all of the account, we have hundreds of thousands of accounts, and we have few hundreds of sales people, they don’t cover all of them.
And we want to make sure that we are still dealing with this account and we are properly handling them, and we are using AI for that. So what we want to do is to have a system that mimics the way a salesperson would do stuff. I mean, while what the sales person does, he senses the environment, understand what are the needs of his customer, based on that, he applies some thought, and he interacts with the account in order to offer something or to suggest, it could be a phone call, it could be some kind of an interaction that can lead to a success or failure, but you will learn from it, whether it’s a successful failure, you learn a new, it’s a continuous cycle, so this is how a human would do it. And we want to do the same, but at scale with an AI platform.
So what does sensing mean when we talk about a AI platform? So in this case, obviously, we want to cover hundreds of thousands of accounts. So it means that in this case, and I want to give it as an example, why sometimes bringing the data is not trivial. So in this case, it means that we need to crawl millions of web pages. We need to crawl the social media, bring tweets, and then the day-to-day activities in intel.com and other websites, and many, many other pieces of information that we collect from the outside, mesh them up with all of the internal data that we have, it could be from the CRM system and other systems. And best of both, this sense, the big sensing to take actions. And this is, as I said, it’s not trivial if you had to do it for each model, it’s not economical, it’s not practical. Once you have a system that takes care of all of this for you, now many, many, many models can take benefit out of this, sensing the data that was made available.
And we use it in two ways. One of them, if we have a sales person to cover an account, we provide them assists, like in sport, an insight that they can use, and we track how beneficial was this insight, and we see that most of these insights were quite beneficial, more than 87%. But the more interesting thing is when you don’t have a salesperson, and then what we do is actually to close the loop directly with the AI capability, Autonomous Sales, and this is something that could be very exciting, and I want to show what it means from the platform perspective.
So it means a few things, and I will very briefly go through it because I don’t have much time in this session, but let’s begin with the first one. So obviously, doing the sensing is great, but if you don’t react to it in timely manner, it’s useless. You can’t react to something that happened two weeks ago and expect that it will be still relevant, maybe it’s not. So we want the system that is always on, it’s a streaming system, it can’t be a batch mode anymore. And it continuously brings the data, flows it into a message bus and deals with all of the inputs that the system gets in a streaming manner.
And this is very, very important
an always-on streaming capability, producer, consumer, and now you can move to the next stage, which is equally important, which is microservice architecture.
You have the data flowing and you can have different microservices acting on this data. Why is it important? Because you don’t build such platform in a day and not in a quarter and not in a year. And you want your system to be very flexible so you can change things, head things that’s needed, and if you don’t build it, and it becomes a huge monolithic system, you will lose control and you won’t be able to change anything in it. So using microservice architecture allows us to have many microservices, each one of them is isolated, you can add such, you can change them without being afraid to impact the old system.
The third aspect is about the decision making. The system has to make hundreds of decisions and thousands of decisions every day. In many cases in our, it means Deep Learning Inference, and we need to optimize the inference for it to get low latency at scale, even when we have very high throughputs of data flowing in, and this could be challenge, I mean, when you have to run many, many models in parallel and still get the optimal latency, it could be a challenge and you need to manage it asynchronously, I mean synchronous won’t work well here because then your slowest model will impact the rest, and this is why we have gone and implemented the system that actually allows us to optimize the inference.
And we implemented an asynchronous approach using Asynchronous Inference Unit, AIUs, so we can have many of these, each one can be allocated per source or per model or per anything that is relevant in that domain. And this way you can achieve a lot of parallelism in the inference mechanism while still manage, if needed, you can still manage batches of data that you want to inference, sometimes you need sequences because you are using STMs or something like that. So the ability to batch the data, and we are using Redis for that. And on top of that, to have many, many units that can do this inferencing asynchronously, continuously, is very powerful, and this is how we achieve the ability to, and by the way, it can work against any serving, it could be tested for serving against Seldon and anything, so we are not limited with the serving mechanism or technology, but still we can do, or maximize what we can get out of the hardware that is used in that serving mechanism. Another aspect is about the data.
Again, I’m just illustrating that why an AI platform is required because in this case, for example, many models can benefit from the way you structure your data. So obviously you will need SQL data, SQL container for your SQL-related, the relational data, you would need something like Elasticsearch which we have, in order to be able to use the tech sufficiently, you would need the big data container or object store. But what we found is that we also need to represent the data as a graph.
Why, because you could have a lot of knowledge about each of the entities, about the people, about the companies, about product, isolated, but once you have the relationship between them, a whole new world of understanding can be revealed, and this is very important. Another thing is that you can accumulate data instead of just dealing with each assist and forgetting about it, once you accumulate the knowledge of each of the assist or the insights that you got to from the history, this by itself can give you a lot of insights in the future. So this is very powerful, and for this purpose, we also developed the subsystem that is actually responsible, a generic one, that’s responsible for maintaining graphs or Knowledge graphs.
So it means that they, in this case, again, where I don’t have time to go through the details, but we again have a message bus where you get your data flowing in, and you have asynchronous units that can translate the metadata into the graph semantics. And you can also have extractors that can reach the data from the web, to add more information about the entity that you have brought. And on top of that, you want to then build the graph itself and you have a graph builder that it’s the only portion that communicates with the graph, and this way you can abstract the graph and change the graph technology. We didn’t want the other units to be coupled to a specific graph technology, so we have, they send the semantics, they generate the semantics of the graph, and the graph is the one that translate it to the specific graph technology. And on top of that, we have a system that continuously keeps this data up to date, crawling the graph and getting the new, fresh data from the web. So this is a nice capability that allows us to sustain the graph. And the last aspect is about the dealing with text, NLP, NLU, so in this case, we had to deal with the fact that it’s very difficult to train models in such case where you don’t have supervision, it’s very weak supervision, you don’t have labels, and we use technology from Stanford called Snorkel and added some capabilities around it in order to be able to effectively train models and use them, NLP, NLU models in this platform.
So this is the overall flow,
and this is how it looks when you put it together, a system, a platform that is hooked into different sources, a microservice architecture, where you have different microservices taking care about the data that is flowing in and generating in the data layer, the data that is required. And then you have the predefined APIs that are enabling this platform to hook back into their business process, it could be hooking back to the CRM system, sending an email. It could be many, many things that are actually the actuation based on what the models produce. Okay, and with that, I will just summarize the thing that I have tried to illustrate.
So first the AI platform allow us to deploy the models using continuous delivery like any other software. Then, the aspect that I’ve shown now, sometimes bringing the data to the models is not easy, and you want to separate it from the model itself, obviously, and it’s becoming the concern of the platform and not the models, so the platform takes care about bringing the data for all of the models that are hosted in it. And the same thing, the platform takes care about all of the integration hooks to the business process that the models are serving, okay? And then you have the sustainability aspect that I’ve shown before, indicators, retraining, logging notes, and all of these are part of the platform, and as I’ve said,
the platforms are reusing some of the capabilities that we have deployed in order not to reinvent the wheel each time in each platform. And this is it, thank you very much, and I hope that this was beneficial for you.
Moty Fania is a principal engineer and the CTO of the Advanced Analytics Group at Intel, which delivers AI and big data solutions across Intel. Moty has rich experience in ML engineering, analytics, data warehousing, and decision-support solutions. He led the architecture work and development of various AI and big data initiatives such as IoT systems, predictive engines, online inference systems, and more.