Industrialized Machine Learning
For Governed, Responsible
and Explainable AI
- Data and AI use cases are taking off – even in mature industries
- But challenges remain in operationalizing AI and ML
- How do I standardize the machine learning lifecycle?
- AI projects need to be scalable to make a notable business impact
- The key enablers of industrializing AI
- AI is a team sport – requiring a strong data foundation and cross-team collaboration
- An introduction to Navy Federal Credit Union
- Industrializing ML at scale at navy federal for strong customer outcomes
- Building a production-grade machine learning model for a personalized savings journey for customers
- Getting control and reproducing data and features is essential to industrialized ML
- Enabling teams across the company with industrialized ML
- The key pillars of Mlops – the heart of industrialized ML
- MLOps in a natural language processing pipeline
- Full machine learning lifecycle demo – feature management, model management, and model governance
- Dashboarding for for cross-team feature discoverability
- Q&A
- Here’s more to explore
- Data and AI use cases are taking off – even in mature industries
- But challenges remain in operationalizing AI and ML
- How do I standardize the machine learning lifecycle?
- AI projects need to be scalable to make a notable business impact
- The key enablers of industrializing AI
- AI is a team sport – requiring a strong data foundation and cross-team collaboration
- An introduction to Navy Federal Credit Union
- Industrializing ML at scale at navy federal for strong customer outcomes
- Building a production-grade machine learning model for a personalized savings journey for customers
- Getting control and reproducing data and features is essential to industrialized ML
- Enabling teams across the company with industrialized ML
- The key pillars of Mlops – the heart of industrialized ML
- MLOps in a natural language processing pipeline
- Full machine learning lifecycle demo – feature management, model management, and model governance
- Dashboarding for for cross-team feature discoverability
- Q&A
- Here’s more to explore
Introduction to industrialized machine learning
Kevin Clugage:
Hello, everyone, and welcome. Thank you for joining. I’m Kevin Clugage, Senior Director in the marketing team here at Databricks. And in a few minutes, we’ll be joined by my co-presenters from Accenture, Atish and Nate, as well as Bryan Christian from the data science team at Navy Federal Credit Union.
Our talk is called Industrialized Machine Learning, and this is really an approach that was developed in collaboration between Accenture and Databricks that really is designed to help companies move beyond that machine learning pilot stage. It’s based on a set of experiences and tools that have been developed out of working with a lot of companies like Navy Federal Credit Union that helps them scale their machine learning operations, and really takes a holistic approach into how to approach that. So let’s go ahead and get into it and look at our first slide.
Data and AI use cases are taking off – even in mature industries
So first, just to set the context and what motivates having a talk like this is just recognizing that the data and AI space has massive potential. Why is that? But when we talk to companies about what use cases are driving their business, it becomes very clear that data and AI have tons of potential to impact and improve those areas of the business. Whether it’s in a healthcare and life sciences space with patient care, accelerating R&D for drug discovery and so forth, or financial services and risk mitigation, or even in retail and CPG, trying to forecast customer demand. For instance, in the current situation, the forecasting for goods like toilet paper and wipes and those things had a massive spike in trying to analyze that quickly. But it applies to media in entertainment and studying customer behavior, churn analysis, as well as the public sector, looking for things in fraudulent or in helping human services areas.
So lots of use cases across the board and in the next slide, these are just some examples in the thousands of companies that Databricks has worked with. We’ve had a great opportunity to see a bunch of concrete business results. Some of those are on the revenue side, for instance in the online retail space, improving recommendations for customers into driving their conversion for purchases has $30 million of additional revenue in one case; or in the health and life sciences, accelerating drug discovery with more advanced things like genomics has massive revenue increases as well. There are a lot of cost-saving examples, too. Reducing infrastructure costs or preventing fraud, and especially when dealing with massive data sets, these can to hundreds of million of dollars in cost savings and other examples across the board. But it’s very clear that the business benefits are real across lots of use cases and lots of industries.
But challenges remain in operationalizing AI and ML
So if you look at some of the industry research, you can see that lots of companies are catching onto this; 83% in this industry survey found companies are already planning to use some cloud-based AI tools within the next year. And 75% of those are looking at some kind of open source tooling to go with it. So very clearly, adoption across industries and most companies are looking at adopting these tools in a very material way. So that leads to this common story, with as many companies looking to pursue these use cases and adopt these data and AI tools, what happens when they get to their first pilot? So this is a great example of a company that found they took only three weeks to develop that first model, but now it’s almost a year later and it’s still not deployed. This happens a lot because developing that first pilot model can be a great exercise, but there’s a lot more to it to develop the entire machine learning operation cycle. So what makes that so difficult?
Well, if you go the next slide, you can just see the data is messy, what you’re dealing with needs to be managed. It’s not a one time exercise, it’s an ongoing exercise. There are lots of tools that people often need to stitch together and figure out how they work and handoffs in between them. There’s really no formal machine learning operations life cycle that’s defined for the processes that people need to follow, need to be built because they didn’t exist before. And there’s little to no governance over those processes or the data sets that are being used. So all that needs to be constructed to support a formal operations life cycle, taking that pilot all the way through to the production stage, and then doing it again in a continuous fashion.
How do I standardize the machine learning lifecycle?
So the next point, if you advance, is just that machine learning teams often spend over 50% of their time just maintaining the existing models because of some of those inhibitors and dealing with the data, the tools, the operation handoffs, and so forth. Which can be quite a productivity drain and takes their time away from sort of improving those models or building new ones. So there is a path forward, though, if you start to think about this life cycle of operationalizing machine learning and how that goes between the folks who are preparing the data, the folks who are building the models, the ML engineers, and the folks who are doing the deployment into those applications for the business. That’s a continuous cycle that goes around forever. It’s dependent on the data that people are using. There are lots of ways to evaluate its performance and there’s lots of teams and systems involved.
So if you keep this mental model in mind as we talk through the rest of this presentation of this talk, you’ll start to get a picture of what’s needed at each stage in a holistic fashion to help make your machine learning pilot part of an ongoing robust life cycle. So with that, I’m going to turn it over to the rest of the presenters. But just as a quick overview; Atish is going to talk about the industrialized machine learning at scale; Bryan Christian, from the data science team at Navy Federal Credit Union, will talk about how they’ve applied this and what other interesting use cases they’ve been able to address. And then third, Nate is going to give us some of the details on what’s in this industrialized machine learning framework from Accenture and wrap us up with a demo. There is a Q&A box. I would like to encourage everyone to enter your questions as we go. We would certainly encourage you to do that. We’ll have time to answer those questions. And with that, I’m going to turn it over to Atish.
AI projects need to be scalable to make a notable business impact
Atish Ray:
Thank you, Kevin. Hi, everyone. This is Atish. So what I thought I’ll spend a few minutes on providing a general overview of what we see is going on with AI in the industry now. What we are generally seeing is obviously AI at scale is becoming a very important factor to our businesses. So if you go to the next slide, we have done a detailed survey of a lot of the executives, actually across industry, to understand how they view AI. It’s clear, based on our survey, that about 80-plus percent of those executives clearly see scaling AI as a strategy to grow their business. And again, when we went to the next level of details, we also understood that about 70-plus percent of those execs are having challenges in terms of scaling their AI.
Now, we also found from the surveys that the scaling AI is directly correlated to the success rates and the return of investments that the companies are making out there in the field; the companies who are more in the proof-of-concept stage versus the companies who are successfully scaling AI based on some strategic investments. Now, if you show the different stages of the AI maturity, what we saw from our survey is there are really three stages of AI maturity. The first stage is what we generally would call as a proof-of-concepts factory; that’s where we see projects which are mostly IT-led, they are mostly departmentally-focused. The investments are not great, it’s not on the C-level agenda, but it’s been applied mostly to develop some proof of concepts. The alignment with business are not clear, so 80 to 85% of the execs that we interviewed, the companies that we interviewed, they’re in this space.
Now, the next level of maturity that we see is what we generally would call strategically scaling. So these are the companies where AI has started making it to the C-level agenda; they have a CA or CDO in place, they have a operating model in place. They’re using AI at scale, but mostly for point-solution personalization. But this is where also we see the onset of what we generally would call as machine learning industrialization starting to take shape to be able to help them strategically scale these solutions that we see in the point situations. About 10 to 15% of the companies are in this stage. And finally, the companies who are really industrialized for growth, which is about less than 5% in our survey, what we see is they really have a digital platform mindset. So, where they have really industrialized the ML part of it, they’re really delivering AI across multiple business units, different cases, and at enterprise scale. And they have a clear vision, they have accountability operating models in place.
So this is where ideally everybody would like to go. And our surveys also show that there’s a clear relationship between how you are strategically scaling and industrializing for growth with the financial metrics that are associated with your company. So when we did the survey, we already saw the metrics such as your PE ratio or your enterprise value to sales or your price to sales, those kind of financial metrics may drastically improve based on how mature you’re in the scale of AI. So that definitely gives us a huge driver to be able to start working towards getting our maturity to scale level, the level two or level three. And important thing to understand that the enablers are now in place.
The key enablers of industrializing AI
So if you move to the next slide, we quickly wanted to cover these key four key enablers that we see in the space that are practically being leveraged, usually to be able to get the scale in place. Firstly, obviously the growth and adoption of cloud services. So cloud is bringing the necessary storage and processing scale that you are required to be able to have to be able to scale machine learning. Secondly, the platforms for data science, for example, Databricks Unified Platform, what it brings is some of the fundamentals that you need to be able to scale in. For example, collaboration, for example the management, the governance, that you need to be able to scale machine learning without having to build a lot of these things from scratch. Which brings the agility, brings the flexibility that you need to be able to do it.
Thirdly, some of the commoditization of data science which has led to a development of a lot of these third-party tools and technologies or of our frameworks that are in place now, which makes it easier for some of the citizen data scientists or some of the skills to be applied in a much more democratized way, it makes the development and deployment of these solutions easier. And finally, a lot of the lessons learned or the best practices that we have accumulated from the successful companies who have scaled in this space is giving us a clear blueprint to basically take our steps forward and adapt this in a much more organized fashion.
AI is a team sport – requiring a strong data foundation and cross-team collaboration
Now, with this enablers in place, if you move to the next slide, the challenges still continue to exist. So, obviously the data foundation continues to be a challenge, so a lot of focus has to be put into the data foundation to be able to get it right, to be able to get the data to be in a state where it’s trustworthy but it’s consumable. Agility and automation is another factor where we see a lot of work that needs to happen to be able to get to the level of industrialization. The governance and controls, a key factor having the right operating model in place, having the right collaboration in place, having the right set of controls operating so that you can do this at scale in a multi-user and a multi-tenant environment.
And finally, the scaling of the deployment, which is where we see a lot of companies have traditionally struggled, that needs to be accounted for. Now, what we are seeing is to be able to address these challenges, the successful companies as we get into more the industrialized space, putting in a multidisciplinary team becomes key to be able to get to a successful implementation and scaling. Now, you see some of the key roles like data scientists or machine learning engineer, or even your business analysts, your data architects, how you form a cross-disciplinary team to be able to have this different roles and personas work in a collaborative manner, becomes a very key to your success.
Now, what we are going to do now is I’m going to let Bryan talk through how he has led NFCU to scale and industrialize using some of these techniques that we see some of the enablers that we talked about in a way that he’s able to serve his business applications in a industrialized and scaled fashion. Bryan?
An introduction to Navy Federal Credit Union
Bryan Christian:
Thank you, Atish. My name is Bryan Christian. I’m the Data Science Lead for Mission Data at Navy Federal Credit Union. Today, I’m going to talk about how we’ve leveraged industrialized machine learning. But first I’d like to introduce Navy Federal as a whole. So, Navy Federal is member-owned and a not-for-profit credit union, as all credit unions are. And Navy Federal’s mission is to always put numbers first. And we have a lot of members, 9.5 million and growing, this makes us the largest credit union in the United States.
Now, while the name might signal only Navy membership, and indeed that’s how it was back in 1933 when we started, our membership now is very open to a broad group of individuals and includes the Department of Defense, Navy, Marines, Army, Air Force, now the Space Force, Coast Guard, Veterans and family members. And really, each area of the credit union operates with the same purpose in mind, which is making members’ financial goals the top priority. And because we’re a not-for-profit, any surplus funds are returned to our members as dividends or just loan interest or improvements to our product and services, which means great rates and lower fees and exclusive discounts. So in short, at Navy Federal, it’s all about the members, our members truly are the mission.
If you go to the next slide, I’ll talk about how this takes a lot of people to do this for 9.5 million and growing people; 21,000 employees across multiple campuses, largest in Vienna, Winchester, Virginia, and Pensacola, but also 341 branches worldwide, including 26 international. And we truly have this commitment to excellent member service, and it’s really the cornerstone of everything that we do. And so, this is a really core aspect of the company and what we’re driving for, and what I’ll tell you about next, which is Mission Data.
Industrializing ML at scale at Navy Federal for strong customer outcomes
And so, one of the things that we’re doing is centralizing our enterprise member-centric initiatives and Mission Data is helping to focus those through a machine learning and AI perspective. So Mission Data itself is a multi-year digital and analytics transformation at Navy Federal, and it’s really centered around delivering that next-generation member experience. Importantly, this is not a project or a program, but it’s really an enterprise transformation that we’re completing through our engagement, both with Accenture and Databricks. After some exploratory learnings, Mission Data started just about a year ago, and we’ve already seen some tremendous success in our capabilities, and we’re driving more and more personalization.
So our strategy is really focusing on these four pillars you see on the screen, that is Member-Centric, Data Driven, AI Powered and Cloud First. Now, Navy Federal is not new to machine learning or AI, we’ve been doing that for many years, so what’s the difference? And the key difference is industrialization and scale, really the cornerstones of the talk today. So we’ll be talking more and more about how Navy Federal is actually able to accomplish this with Mission Data.
So if you go to the next slide we can address, why Mission Data? Why now? Well, the traditional relationship between business and people is evolving. Successful companies are bringing more human focus to their digital interactions and designing a truly collaborative digital experience. Now, this is shifting the customer expectation. How many of you have a smartphone or a smart watch, and you have a fitness app that tells you how to take that next step or how to really drive you to do something. And it personalizes it to you versus someone else who might have that same application on their smart watch or phone. This equates really, to that personalization and is really now what Navy Federal is driving toward with Mission Data.
And we know that 30% of customers now expect companies with which they engage to know more about them than ever before, so this is a perfect arena for AI and ML to really drive that relationship to that next stage. Further, and this is very important for our financial services, is we know 88% of customers find a company that can personalize their experience without compromising trust to be much more appealing and relevant to their needs. So with that, Mission Data has a robust governance platform that intersects all aspects of what we do, and especially the AI and ML components.
Building a production-grade machine learning model for a personalized savings journey for customers
So if you go to the next slide, I’ll walk you through an example. So as I mentioned before, everything we do focuses around providing our members the best service possible. So we have member-centric goals. So let’s take this example of a member-centric goal, which is to help our members achieve a financial cushion for a rainy day scenario. Obviously, very important during the pandemic, but also even outside of these current circumstances. So our solution was to build a machine learning model to predict which members looked like they’re about to begin saving.
With such a model, we’re able to communicate relevant messages to those who are on the brink of saving, and for others who aren’t quite ready to begin yet, we could also address their savings journeys in a more personalized and relevant manner. So this really ups the game for personalization across the board. And we’ve also done this in our enhanced governance framework by automating key model evaluation metrics and our MLOps processes, increasing automation such as automated ROC curves, confusion matrices for classification models, and so on. We also have robust fairness metrics we explore to ensure there’s no disparate impact across different types of demographics. Even for models that are highly explainable, we double down on those to make sure that the models are responsible and present a complete fair picture for the member.
Getting control and reproducing data and features is essential to industrialized ML
And keep in mind, Mission Data is truly focused around the member experience. We have other areas of the Credit Union which also focus on risk and underwriting, and they too also go through these as required. But we wanted to lift this also over to the experiential side because we believe that responsible AI is crucial, especially as we begin scaling this up to hundreds and thousands of models in the future. So all of this transformation doesn’t come without some challenges. So some of the challenges that we addressed that are technical were data wrangling, feature engineering, and code silos. And so, obviously with vast amounts of data, there’s a lot of time that’s actually spent on cleaning the data, transforming the data, getting to actually be in a state where you can make valid predictions for your given use case or for your given model that you’re working on.
And many times, I found that I’m sitting there building a feature and I’m wondering, “Hey, haven’t I built this feature just a few months ago or isn’t there someone else amongst these 21,000 employees who probably also up the same feature?” Well, why should we have to recode it over and over and over again? And that’s really a code silo, where you don’t see either what you’ve done before or certainly we don’t see what someone else is doing elsewhere in the organization. And so, one of the benefits of MLOps, the industrialized machine learning, is you break down these silos so that the same code can be leveraged over and over again by different data scientists, ML engineers, and it’s clearly documented in a centralized location where everything will move to production through that path. And so, what this enables us to do as data scientists is to actually focus more on actually building models and solving business challenges, and less on actually moving those models to production.
Now, that’s an extremely important step and the only way that can happen is with an industrialized ML framework. It allows even the ML engineers to focus more on moving things to production, and by separating those responsibilities, you actually get more efficiency across the board. Now, this also comes to some team challenges because up until now, most people are used to doing everything; you build the model and you productionalize it and you release it. Obviously, also going through all the governance processes which we have across the Credit Union today.
Enabling teams across the company with industrialized ML
So a large part of this is the change management; how do you work with teams as you’re enabling others across the organization to use these capabilities? To make sure that they know, “Okay. This is how it’s a little bit different where it’s actually going to make your life far more easy and you’ll be able to move through more models and increase your scale and output much more quickly.” This comes with an enormous amount of knowledge transfer, so we have actually built an accelerator MVP where we are accelerating teams of data scientists across the Credit Union to use these capabilities and then releasing them out to go forth and build models that solve their business cases as well.
And of course, this comes with the automation that we’ve talked about just previously about the automated model evaluation metrics, automated model governance. And this really opens up the other bottlenecks you see, which is the model evaluation bottleneck in terms of, “Is this model good enough? Is it evaluated in a standard way? And then ultimately, is it a fair model as well?” And so, if you go to the next slide, you’ll see that one of the primary benefits that we’ve observed is that what used to take about 80% of our time for data scientists, at least in the model-building arena, which was focused on data engineering and data wrangling or getting your data in that state where you can manipulate it to then move to actually building models, has been reduced to about 20%. So really turning an enormous corner there in terms of efficiency and removing bottlenecks, and this is really helping Navy Federal to grow and move into this next stage of AI and ML.
Now, Mission Data is growing itself as well. And if you go to the next slide, this takes a number of individuals in a growing team. So we always have opportunities available in a number of specialties, and so here is just a few that are commonly available. And I would encourage you to always to check the Navy Federal career site linked at the bottom of the page if you’re interested in joining the Mission Data team and the Navy Federal team. And there’s a number of reasons to do so, apart from everything I mentioned, which is extremely exciting from a technical perspective. But the company really puts members first, and so you get to do these really awesome things with the data, but feel really good about how you’re helping members along the way as well.
And this, for me, has been the most valuable part of working in Navy Federal. But there’s also another number of other benefits. In our over 80-year history, we’ve never had a single layoff, which is tremendous. So with Navy Federal, there’s a number of reasons why you might be interested to join the team. If you go to the next slide, you also see that we have a number of accolades that also support this. For several years, we’ve been in the Top 100 companies by Fortune, we’re a Best Workplace for Millennials, Best Employers for Women. And so, I really encourage you if this is something that interests you, to check our career site and please reach out. And with that, I will pass to Nate.
The key pillars of MLOps – the heart of industrialized ML
Nathan Buesgens:
Thanks, Bryan. Hello. My name is Nate and I’m a consultant on the topic of ML engineering. And what I’d like to do is dig a bit deeper into the pillars of this industrialized ML approach. These are the key ideas that informed the approach that Bryan has described. And as Kevin and Atish have described, we’d like to provide you with a roadmap that takes you from a proof-of-concept factory to this industrialized and scalable AI strategy. And so, on the next slide, we’ll see the first pillar.
We want a balanced focus on the challenges that occur throughout the ML application life cycle. So, that’s not just the data science work that we’re doing, it’s also the ML Operations. And in and of itself, this isn’t an entirely novel observation, there’s an emerging field of engineering around machine learning. A number of organizations have already begun to feel the growing pains of operationalizing their machine learning applications, and there’s lots of existing material out there for how to automate the deployment of a data science notebook.
But what differentiates this approach is a focus on not just automation, but also governance as a key component of those ML Operations. A click-button deployment of a notebook doesn’t necessarily accelerate you if you don’t have the confidence and the model quality in that notebook. And so, what really differentiates our approach is that Accenture will work with your organization to identify what your organizational values are for model quality and model fairness. And then, we codify not just the automated deployment of your models, but also the controls that implement those organizational values.
So, the second pillar of industrialized machine learning is the operational management of not just the model, but the full end-to-end ML pipeline. So model management using MLflow is a key and necessary component of an industrialized ML application, but it’s not sufficient. And these are term that are sometimes used loosely terms like model and ML pipeline, so we’ll start with some brief definitions. If you click through, the ML pipeline is the suite of estimators which are fit to the data in tandem through the training process to produce a model. So the model is the output of the training process, and when we’re training a model, especially in an industrialized context, we’re commonly wiring together many estimators, each feeding features downstream to the next to produce the pipeline which is fit wholesale to the data.
And so, model management serves as a repository for those training artifacts, the serialized model, the model quality metrics, and it provides that workflow for promoting the model into production. But in order to codify our governance standards, we also want to manage the training run time. So we want to manage the construction of the pipeline from our feature estimators and the training process that fits our pipeline to the data. We don’t want that to be an ad hoc process which is spread across disjoint notebooks. Bryan described the Mission Data governance standards, we want the confidence that we’re complying with those standards because we’ve isolated those concerns in a common training run time.
We also want to manage the prediction run time, the run time which is making our predictions and productions. We want to do this to help address challenges of collecting feedback from production and identifying things like data drift. And finally, we want to manage the features that are going into our model. So there are a lot of unique data management challenges that occur in machine learning applications; we need to solve not just our classic data management challenges, things like data discovery or feature discovery and streaming and batch analytics. But there are also novel challenges like the management of our holdout data or pipelining features through multiple models.
So I’ve described two of the key pillars for industrialized machine learning and how we’re augmenting the model management and runtime management and Delta Lake capabilities of Databricks. Feature Flow is a technical implementation of those industrialized machine learning concepts. So it’s implemented using a Databricks Python library, and that includes an API for deploying these managed jobs, the managed training job, the managed prediction job, and managed the feature store. And it also provides a suite of common analytics algorithms and model evaluation algorithms, which just generally accelerate the data science process. And we’re going to demo a workflow that uses this package today, but first I want to elaborate, provide a few more details on what we mean by management of the ML pipeline and why this is so essential.
So this is a statistic from Gartner that most organizations with operational experience, they’ll probably be able to relate to this. Only 20% of our analytic insights, when they make it into production, are going to result in moving the needle. The reality is that data science is an iterative and experimental process; it’s a science, and sometimes the business hypothesis that drives a data science experiment is invalidated in production. So what does this mean? Does it mean that we are writing off 80% of our data science investment? It doesn’t have to mean that if we’re able to capture, in addition to the model, the many other artifacts that are produced throughout the ML pipeline that have business value in and of themselves, if only to accelerate future experimentation.
MLOps in a natural language processing pipeline
This is an example NLP pipeline which is producing a sentiment model. And here we can see that that sentiment model is one of many artifacts that go into this pipeline, which is producing our predictions. Some of those other assets include the feature data, so these tokenized text and vectorized text can be reused in future NLP models. In addition to the featured data, if we have good management of the stages of this pipeline, those featurization strategies themselves can be used to inform future research. And these intermediate models oftentimes have explanatory value in and of themselves. So here we’re showing a WORD2VEC model, which can have some explanatory value like many other embeddings. Or similarly, if you’re doing clustering or segmentation as an input into your model, that itself can have explanatory value.
And finally, if we’re effectively capturing the production feedback necessary to validate or invalidate our hypothesis, if we’re able to capture this consistently and in a standardized governed way, that can drive the next iteration of our experimentation. And on the next slide, we go into detail and enumerate the kinds of governance and automation that we’re layering on top of our industrialized ML applications. And to reiterate, these are challenges that can only be addressed with a holistic view of data science plus ML Operations, and model management plus ML pipeline management. So for example, deployment automation and the challenges that we consider with deployment automation. So there are common CI/CD controls, such as the enforcement of code quality, that are often overlooked in data science applications.
The CI/CD process gives us the opportunity to register our featurization and ML pipeline artifacts using tools like MLflow to make them discoverable and reusable. And then, during the automated deployment, we also have the opportunity to identify opportunities for optimizing the execution of our pipeline. So for example, if there are multiple pipelines producing multiple models, maybe there’s an opportunity to find some efficiencies by combining that into a single pipeline. And then through the management of our training run time, we’re able to enforce standards for model quality and model fairness. This is where we’re enforcing compliance with our governance standards and values for model fairness. And then, through management of the production run time, we’re able to version our models and implement strategies for AB testing. We’re also able to monitor our models and standardize how we collect feedback from our models to identify data drift.
To implement this governance and automation, we hook into the workflow for deploying a model, which we’re visualizing here on the right side of the screen, in some key locations. This is the workflow that we’re going to demonstrate today, and a few key things to highlight. The first, between our step number two, the data science research process and our deployment automation process, the artifact that gets handed off between this is not the model itself. Instead, these are the research artifacts that describe the ML pipeline which is going to be deployed. That deployment process deploys the ML pipeline to be trained using a managed training process. The output of that process in step four is the model which goes through the Databricks and MLflow model promotion workflow in order to finally be used by our managed prediction run time to make predictions.
Full machine learning lifecycle demo – feature management, model management, and model governance
So now what we’re going to do is we’re going to demo this workflow, starting with the data science research environment where the scientist is doing their algorithm design to document the changes that need to be made in their production ML pipeline. This is the notebook interface in Databricks, where a data scientist is probably going to spend most of their time. And from an MLOps perspective, we want to do our best not to dictate or reinvent that data science process, but at the same time, we’d ideally like to push governance and automation as far upstream into that data science process as possible.
This allows us to potentially accelerate data science research by enabling our scientists with additional tools. It helps us to ensure consistency and reproducibility, not just between models that we deploy into production, but also consistency and reproducibility in that transition from the data science research to production. This just generally helps us avoid surprises and anything getting lost in translation. So here in this first cell, we’re showing a standard header that we like to encourage our data scientists to include at the top of their research notebooks. This gives the scientist access to a suite of standardized and governed analytics and also tools the notebook so that we have the option to create a smoother transition for this notebook into production with minimal code changes.
So one advantage of installing the Feature Flow analytics utilities and importing Feature Flow in this way, as opposed to for example, installing it on the interactive cluster is that this allows us to parameterize the version of the Feature Flow package we use. So scientists can use the version that makes the most sense in an interactive context, they can get bleeding-edge updates, or they can pin their notebook to some branch of the version-controlled code. But then, as this notebook is transitioned from an interactive notebook into something that’s potentially being deployed into production, we can parameterize this notebook correctly based on the environment it’s being deployed to.
So for example, we can tool the notebook to use an interactive experiment. In this interactive context, we’re using the Databricks experiment that they manage and associate with this notebook. And in a similar way, we can parameterize where the notebook is getting its data from and where it’s writing its data to.
So to make this a bit more concrete, here’s an example ML pipeline which is pulled almost verbatim from the Spark documentation. Here, what we’re doing is we’re defining the stages of our pipeline where we’re doing some text tokenization and text vectorization before we finally create a sentiment model. We collect all the stages into a pipeline, and then we fit that the stages of that pipeline in tandem to the data. And as a reminder, this is not our target state for production. What we’d prefer is to not explicitly be constructing this pipeline, but instead to govern the construction of this pipeline and how we fit the model to our data. We want to isolate those concerns in a managed process. But this is typical for the initial research stages of an industrialized ML workflow.
So once we have that model, here we’re showing some of the native model management capabilities. This is when model management enters the picture and here we’re showing the integration of MLflow with Databricks where we’re collecting some model quality metrics, and then logging those metrics, tracking those artifacts using Mlflow. And in an interactive context, these artifacts now become accessible via the notebook. We can track the performance of our model over time in order to keep notes during the data science process. But in addition to this data science use case, it’s really the ML engineer who’s getting a lot of additional value out this model management capability, because this allows us to validate that when we go into production, we’ve been successful at reproducing the experiments that are in this notebook.
So sometimes it can be difficult to encourage a consistent usage of these model management capabilities during the research phase. Standardized artifact collection is what’s necessary to have a consistent way to know that nothing has gotten lost in translation. But an alternative approach to asking the scientists to include this boilerplate and include it in a consistent way is to instead tool of the analytics that the scientists are doing in order to create value for the scientist while at the same time standardizing how we collect these metrics. So here’s an example of that.
Here, we’re providing a utility to science create a visualization called the Facets visualization; it’s just an open source visualization that can be found online which is oftentimes used to assess data quality.
And this shows how we can accelerate the data science analytics with these common utilities and at the same time, create a smoother transition into production. So here we’re creating value for the scientists with a simplified API to create this Facets visualization. We’re providing a common repo for these analytics, so if they’re a common analysis that scientist is doing, they don’t have to copy/paste that from notebook to notebook and create potential error and technical debt as they do that. And we can also add operational improvements to these analytics. So for example, much of the documentation for the Facets visualization creates these summary statistics that are being visualized using pandas. We do this in a more scalable way using Spark data frames. And so, that helps accelerate the scientists who might be trying to do analytics at scale, but it also creates a smoother transition into production where scalability might be a bigger factor.
And then the final thing we’re doing here is we’re tooling these analytics to standardize how we’re collecting artifacts in MLflow. So here with this utility, we’ve collected those summary statistics in MLflow. This is the kind of thing which would give us insight into whether data drift or data skew is happening in production. We can collect these same summary statistics in product and then compare that to what the scientist was seeing in their research environment.
Here’s another example of a ROC curve. Again, just a common data science analysis that’s been standardized in a version-controlled repo to accelerate the scientist in creating these analytics. But the tooling here also ensures that these artifacts are getting collected… in MLflow.
So, so far we’ve seen an ML pipeline that’s being managed ad hoc by a data science research notebook. And so, here’s an example of where this starts to break down. Here’s an example of a small change that we might like to make to this pipeline, it’s actually a small extension we’d like to make to this pipeline. Here, maybe we want to do a lot of the same text analysis, but instead of creating a model of sentiment, maybe we want to create a model of tweets that should be escalated for some sort of moderator. And we can see that really the new business logic here is pretty minimal, but what we’ll often see is many disjoint notebooks where we’ve copy/pasted large portions of the pipeline and the pipeline construction. But more importantly, the assessment of the model and the governance of the model.
And so our target state, especially as we build repositories of many production models, is a process where we’re not coordinating many disjoint notebooks, but instead we have an automated and governed process for managing the construction and training of the pipeline. In other words, we’d like to isolate the concerns where things like model evaluation and model artifact selection are happening so that we have a consistent approach to that across all these models.
So we want an alternative API for model deployment, or more specifically, we want alternative API for ML pipeline deployment. We want one where we are declaring the stages of our pipeline, and then the deployment API is isolating the concerns of the construction and governance of the pipeline. So here we can see by using this API, we’ve registered the stages of our pipeline as well as some metadata about it. And then, we’ve isolated the concerns of constructing that pipeline. So these are the stages of our pipeline which we’ve managed the construction of those stages into a pipeline, and soon we’ll see how we’ve also the concerns of governing the pipeline. So in the future, we can mature this process of evaluating our quality metrics if in the future we have better ideas about how to assess the quality and fairness of our model. We have a single place we can do that rather than needing to replicate that logic across the hundreds of notebooks that we may have deployed.
So some of the things that we’ve accomplished here, we’ve isolated concerns for governance, we’ve also created some computational efficiencies. So instead of calculating these text features run over again, we’ve calculated them once and then we’ve used them to serve multiple models. If we make these features discoverable, that can accelerate future research in the same way that a feature store might. So in a lot of ways, our managed pipeline is serving a similar role to the feature store. And one thing that we’re doing here that we wouldn’t necessarily get out of a feature store is we’re also seeing IO efficiencies. Because we have a single managed run time, these features only need to be loaded once for that pipeline and then can be used to serve multiple models.
More specifically, what we mean when we say the managed training run time, we mean a Databricks job. So here we have a job for the training of this pipeline, which is managed by Feature Flow, as well as making predictions for that pipeline. We’ll see we have two jobs for making predictions and that’s because of the model management capabilities in Databricks, which allow us to promote a model which has been trained either to a production status or to a staging status. And so, we have one job for depending on how the model’s been promoted. When training this model, we’ll see that we’ve standardized… the analytics that we’re performing on this model. There’s a single place where we’ve defined the model quality analytics for both our sentiment model and our moderation model. And we’re collecting many of the same artifacts that we were collecting in the interactive context, but in the future, again, if we want to extend these analytics, we could do that for all models by going to a single place.
We can promote that model to be used in production using the basic capabilities of Databricks model management. And so, that’s now the model that would be picked up by our prediction jobs. So what I’ve demoed so far is using this deployment API through a notebook. And when we do this, we essentially evolve our experimentation notebook into a deployment script with all the original analytics up top serving as an integration test. But teams may choose to use a Python module or a Python executable when they migrate this into production. We have that option as well, we’re not prescriptive about that. This is just a Python API and you can use it from anywhere that you would use a Python API.
Dashboarding for for cross-team feature discoverability
So now we have all the basic components of an industrialized ML application. And so, to give a broader sense of how these components could potentially work together I’m providing an example of dashboard. So with the feature managed capabilities of this pipeline, we can make our features discoverable. But we’re able to not just make the features discoverable, we’re also able to associate those features with the strategies with the pipeline stages that are either consuming or producing those features.
We can drill down into the managed pipelines, and because we’ve standardized the collection of these artifacts, it allows us to build applications on top of those artifacts and build interesting dashboards like this. So with that, I want to take us back to where we started and thank you for engaging in this conversation about doing data science research, but not stopping there and also going beyond that to develop and deploy and govern machine learning applications.
Q&A
Kevin Clugage:
Great. Thank you, Nate. Hi, folks. This is Kevin again. I’m going to post some Q&A, but first thanks to all of our presenters, Atish, Bryan, and Nate as well as the demo. That was really helpful.
So I’m going to start off with a question, for Atish actually, going back to the earlier part of the presentation.
Atish, you outlined the progression that companies often make in three different phases and most companies are still in that first proof-of-concept phase. Would you just give a little color on what’s the most common reason why companies that are in that proof-of-concept phase actually move on to the second and third phases?
Atish Ray:
Yep. So great question, Kevin. So I think the primary drivers that we see are really when it becomes C-suite priority, which leads to a situation where an AI strategy is being put in place, the business objectives are getting aligned, which is a big driver for the AI strategy. The outcomes are getting lined up and expectations are being set. The initial operating models have put in place, which leads to the situation where there is a CAO or a CDO who is taking charge in driving this scaling effort and industrializing effort in alignment with the business’s. So that’s what we kind of observed as a primary driver.
Kevin Clugage:
Yeah. Okay. Great, thanks. Bryan, let me give you one of these questions.
You talked about the journey for Mission Data and you’ve been on it for a while. But can you just give a picture of how many models you will have when you guys reach full scale?
Bryan Christian:
Sure. That’s a great question, and we are very much still in the middle of this transformation to doing all this with industrialized ML. But it’s going to be in the realm of hundreds of actively maintained models in production, if not in the thousands. And that’s keep in mind, from a Mission Data perspective is along the lines of providing that next-best experience and overall member service. That’s really the lens of what those things are focused on; trying to coordinate those across all channels and coordinate how people are responding, taking these responses back into consideration for how you might inform future modeling.
And there’s a lot of different layers that will all play together. And part of that scale is also that it happens across the organization. It’s not just Mission Data and a small team going through and doing this… I guess Mission Data’s not a small team, but it’s not just one singular team. It’s really democratizing AI and ML across the organization so other data scientists and advanced analysts can go leverage these capabilities and leverage the scale that’s possible. And that’s really what’s going to push us to that final stage which we’re coming up on.
Kevin Clugage:
Let me ask Nate a question related to his section.
You mentioned evaluating model quality and model fairness, I think you showed an example in your demo. But just to give folks a sense for what you showed in the demo related to evaluating model quality at the point in the process where it makes sense to check in on that, you showed one example. Could you just reiterate what that example was and how maybe in other areas you’ve seen people evaluate model quality and model fairness? What kind of metrics do people use?
Nathan Buesgens:
Yeah. So there’s going to be metrics that just assess how well your model is fitting the data. So these are things like the area under the curve, the accuracy, the precision, the recall. There’s a whole suite of metrics like that, and then there are ways that you might want to break down those metrics which are specific to your organizational values around model fairness. So you might want to assess not just how your model performs overall in the full population, but how it performs against certain protective classes.
In addition to that, one thing that we’ve learned is we can assess the model quality and ensure that the performance of a model as it moves into production is the same, but that’s not necessarily the same thing as saying model behavior is the same as the model that’s promoted into production if the model with the small tweaks may be getting to similar results through a different means. And so, we also want to assess that behavior of a model, which is why you might be interested in looking into things like model explainability analytics and understanding what’s actually happening under the hood.
Kevin Clugage:
There’s a related question, or at least related to your area about the feature store database, obviously you talked about Feature Flow and the framework library that you add to the database environment. What’s the feature store database behind that?
Nathan Buesgens:
Yeah. So the short answer is Delta. So we keep all of our data in Delta tables, managed by Databricks. A lot of the things that we’re looking for in a feature store, things like metadata management, metadata to associate your models with the features that they’re consuming, or just make the features discoverable so that they can be reused throughout models. A lot of that’s available in the base data management technologies that are available to us in Databricks, maybe with a little bit of tooling, and those are capabilities that are probably only going to get better with time. Sometimes there’s additional infrastructure built on top of that to support online feature stores, low-latency access to the features. That’s just not a problem that we’re solving right now, so we rely mostly on Databricks Delta.
Kevin Clugage:
There’s a question that I’ll pose to Bryan which is you mentioned some of the challenges around code silos and trying to tear down those silos. How do you break that down now at maybe Federal Credit Union so people can search across what other people have done?
Bryan Christian:
Absolutely. So, that ties directly to what Nate was just talking about, especially the discoverability with metadata in the feature store that is using the Delta technology. But the idea of the feature store is that instead of everyone always building their own features and my modeling uses the same feature that your model uses, and so let’s run them both in production. You can get efficiency gains by standardizing that and having both models in this case point to that one feature. But it also means that that’s discoverable for future research by data scientists going forward
So we already have thousands of features in our future store. These are features that have been created for models that are in production or models that are on the way to production now. And if someone wants to add in another feature or another set of features, maybe a new data source comes in and we want to take a first pass at the first several hundred new features that would come with them, we certainly create those.
But when someone has a specific modeling use case that requires a more specific feature that’s not in the feature store, one of the beauties of this construction is the data scientists would pull what they need from the feature store and then add on this additional transformation or feature creation on themselves, and then that would be in the absorbed into the feature store through future processes. And then, that would become available for exploration by other data scientists as well. So it scales with use and by specific business cases, and it grows for other people to be able to consume those same features as well.
Kevin Clugage:
Okay. Well, thank you to all of my fellow presenters and for everyone for joining us today.
Here’s more to explore
Discover our Solutions Accelerators for industry-specific use-cases:
https://databricks.com/solutions/accelerators
Learn more about Explainable AI:
https://databricks.com/explainable-ai
Check out the industry’s leading data and AI use-cases:
https://databricks.com/p/ebook/big-book-of-machine-learning-use-cases