Experimentation to Industrialization: Implementing MLOps

May 26, 2021 12:05 PM (PT)

Download Slides

In this presentation, drawing upon Thorogood’s experience with a customer’s global Data & Analytics division as their MLOps delivery partner, we share important learnings and takeaways from delivering productionized ML solutions and shaping MLOps best practices and organizational standards needed to be successful.

We open by providing high-level context & answering key questions such as “What is MLOps exactly?” & “What are the benefits of establishing MLOps Standards?”

The subsequent presentation focuses on our learnings & best practices. We start by discussing common challenges when refactoring experimentation use-cases & how to best get ahead of these issues in a global organization. We then outline an Engagement Model for MLOps addressing: People, Processes, and Tools. ‘Processes’ highlights how to manage the often siloed data science use case demand pipeline for MLOps & documentation to facilitate seamless integration with an MLOps framework. ‘People’ provides context around the appropriate team structures & roles to be involved in an MLOps initiative. ‘Tools’ addresses key requirements of tools used for MLOps, considering the match of services to use-cases.

In this session watch:
Al McEwan, Head of Capability Development, Thorogood Associates
Deb Lee, Senior Consultant, Thorogood Associates

 

Transcript

Deb Lee: Hello everyone and welcome to this data and AI summit session, Experimentation To Industrialization Implementing MLOps, presented by Thorogood Associates.
My name is Deb Lee. I’m a senior consultant at Thorogood Associates and I help to lead our MLOps practice. I’m joined by my colleague, Al McEwan, a principal consultant at Thorogood. Al is our global head of capability development at Thorogood and he also happens to be a certified Databricks champion. Thorogood is an independent specialist consultancy focusing in the data and AI space. We deliver technical projects, keeping a keen focus on the business objectives of our enterprise customers out of our offices in the U.S., U.K., Singapore, Brazil and India. We’ve official been a Databricks partner since 2018 and Databricks is fundamental to many of our cloud-based data engineering and data science design patterns. Over the course of our time working at Thorogood Associates, both Al and I have gathered a good deal of experience delivering data engineering, data visualizations and data science projects for a range of customers including GSK, Unilever, Johnson and Johnson, Cigna, and Morgan Stanley.
And, today’s discussion is grounded in recent experience delivering MLOps initiatives for a leading global consumer goods company. We will spend the next 30 minutes defining MLOps and discussing its business value, introducing a customer story to make the concept more tangible and then sharing some key learnings and takeaways that we think would be helpful for you to consider as you pursue your own MLOps initiatives.
Before we get into MLOps, let’s talk about what’s going on in the world today. Over the past decade, companies have become more attuned to the value that ML can bring to their organizations. Several developments have enabled the conditions for machine learning to thrive at new levels. Technological advances and changing consumer behavior have resulted in a proliferation of data. Advances in cloud computing have made storage and processing of that data cheaper and easier than ever before. And, the availability of open source libraries coupled with the evolution of analytic tools have made it possible to apply statistical techniques in new ways and at scale. With these conditions acting as enablers, many organizations have invested heavily in experimenting with machine learning and AI in their organizations.
The new IDC spending guide projects AI spend will reach 110 billion by 2024 and a Gartner survey showed that despite a global pandemic, 66% of organizations avoided cutting spending in AI and many actually increased their investments. A study that MIT Sloan did with BCG found that 71% of respondents felt they understood how AI would change the way their business generates value. But only 11% reported actually seeing significant benefit. So, why is that number so low? Today, we’re going to talk about why some companies are struggling to get the full benefit from their ML investments and the role that operationalization and MLOps must play to drive real value.
Machine learning operations, MLOps, refers to practices for orchestrating the development, deployment and maintenance of machine learning models in a scalable and standardized way. Until recently, across our customers and the wider industry, the build, test and deployment of ML models has been conducted on an ad hoc basis and the ongoing monitoring and maintenance of these models has not been subject to a formal governance process. This approach may suffice when the demand for ML modeling is light, and maybe it’s realistic for a data scientist to continuously oversee the models they create without necessarily developing the code base beyond an experimental version. However, in a data intelligent organization, data scientists need to split their time across many modeling exercises while the sophistication of data and models and the trust placed in those models calls for specialized software engineering input.
Over the past three years, MLOps principles have begun to emerge, defining an optimal state for production ML. It ensures models are implemented robustly and efficiently, in an automated fashion, and are monitored over time so that any degradation in performance can be addressed. Operationalized models are scalable, more easily understood and trusted by the business, and enable ongoing experimentation in enterprise tools. This optimizes the cost and ongoing quality of models.
Through automation and the introduction of dedicated roles and processes, MLOps brings together the innovation and exploratory nature of experimentation with the robustness of operationalized systems to empower data scientists to focus on experimentation and building new functionality, among other benefits. Data scientists receive the benefits of operationalization in an MLOps setup that can add additional value as they continue to experiment, such as advanced and automated model monitoring, alerts on model decay, and parallelization, to name some of the most important ones. If they are not doing this already, organizations must shift their thinking to embrace operationalization and the benefits that it can provide.
Why should you care about MLOps? MLOps is necessary to enable organizations to fully recognize the benefits of machine learning at scale. A 2021 Statista report found that the biggest challenges to machine learning adoption include scaling up, reproducibility and version management, and duplication of efforts across organizations as the top challenges to ML adoption. MLOps addresses these challenges by enabling scalability using enterprise tools and techniques for parallelization of model training to improve run times and keep infrastructure costs low. It will ensure reproducibility by using experiment tracking and native tools such as ML Flow, control scripts for cluster configuration, deployments and library management, along with versioning of data sets and models. It will also maintain version security and compatibility by using licensed packages and consistent approaches to operating system and library management. It will also minimize the duplication of asset creation across the organization by tracking and registering assets using tools that enable sharing and greater collaboration, such as Delta Lake and ML Flow.
There are additional benefits that MLOps provides that are also worth calling out. Building proper CI/CD machine learning pipelines will enable automation of training and retraining as needed and will reduce manual dependencies. Model quality and performance over time can be easily monitored, allowing MLOps teams to stay ahead of potential drift and model decay. Productionized systems will accelerate the pace at which enhancements and additional features can be added to products to increase the model’s responsiveness to changes in the business. The use of tools such as model registries, automated unit tests, logging of model metrics and parameters will increase traceability and auto ability.
With that, I [inaudible] over to Al to speak to a recent case study.

Al McEwan: Thanks, Deb. Deb has introduced the concept of MLOps and its value to us. I’d now like to make this concept more tangible for everyone by discussing it in the context of a Thorogood customer using a project where we’ve applied this in practice. In this case, the customer we engage with is a leading consumer packaged goods company with hundreds of marquee brands in its portfolio, including over a dozen billion dollar brands. As I’m sure you can imagine, the impact and importance of optimizing this portfolio is significant. This company has been investing in analytics experimentation and has created many ML solutions that are manually managed and maintained. Different functions of the business have achieved their ML capability by hiring and engaging with data scientists. However, they tended to operate in a relatively separate business unit specific manner, following independent practices.
The company as a whole has recognized the benefit to scaling, automating and enhancing these ML solutions, and to achieve that, they required a global framework and a coordinated strategy. Thorogood worked with our customer to develop that strategy and framework. What I’ll do is introduce the framework on this slide and provide a view of the benefits on the next. It’s worth noting that we didn’t create an MLOp framework purely as an academic exercise. We picked a data science use case that had been identified by the business as being worth investing in and developed the ML strategy, the MLOps strategy and framework, in conjunction with this specific development project to scale and operationalize that use case.
The use case that was selected was to forecast sell in and sell out volumes for key e-commerce retailers in one of the company’s largest geography. It was clearly an important solution for the organization in terms of its scale and value. As the development progressed, we created a set of generic reusable templates. These templates will be used to accelerate the code and pipeline built for future initiatives, thus improving the time it takes to generate value. The result was to produce robust documentation that provides guidance, rules and recommendations that can be used not only for executing, but also for preparing to execute future MLOps initiatives. And, we project tested the documentation along the way.
Deb will share more about the artifacts that we produced later in this presentation. We see them as helpful materials when embarking upon an MLOps journey. They cover some fairly obvious things, like coding standards and conventions, versioning, and so on. They also provide guidance on certain technologies as well as some perhaps less obvious things like how do you assess whether a project is ready to undergo industrialization. That is a key question before committing to the resources.
I’m going to summarize the benefit [inaudible] the customer achieved from the framework definition. Simplification. It’s now easier for teams to industrialize their ML solutions, no matter what the business function is or where they’re located in the world. As an important stepping stone, it is now easier to get started and hopefully less scary. This lowers any perceived barriers and supports the business in maximizing their ML investments. Reliability. The business can trust data scientists [inaudible]… Data science outputs more. The business has greater visibility of deployed solutions and the business appreciates that there’s a robust framework. Just as important, the data scientists feel valued since their solutions are robust and stable and get more feedback about the performance and production and the outputs are relied upon by the business. As an example of the improved feedback in our use case, experiment tracking has come from not doing it all to doing it on every run.
Continuous improvement. ML solutions are not static. With new or changing business requirements and a changing business world, solutions often need to evolve to continue to be valuable and the framework enables data scientists to see when adaptations are needed more easily than they could before. Reuseability. Our framework encourages a reduction in the duplication of effort and emphasizes the re-use of existing assets. There is a potential to consolidate similar applications, including across areas where a more siloed approach may have overlooked synergies. Scalability. Sometimes the value of a data science project is only truly uncovered when scaling, accommodating more data, and running the model with more regularity. Time and cost savings. Minimizing duplicated effort reduces costs and it puts valuable outputs into the hands of the business sooner. In addition to that, defining MLOps standards is allowing the organization to adopt recommended practices broadly and reduce cloud consumption costs.
For this particular use case, the data scientists went from spending six days per month running this solution to a 10 hour job with no human intervention during the run, with a much rich result and performance feedback for the data scientists. The monitoring is automated, alerting the MLOps team when something needs attention, including notable variances in accuracy.
These are all the benefits that that customer is enjoying now that they have a defined MLOps framework. Of course, the big goal is for the customer to win in the fast growing e-commerce arena by being a better partner, and with the data scientists better set up, the business is better placed.
I’m going to hand back to Deb now for the next section, where she will explain more about what we learned through this exercise so that you can consider these as you embark upon or in continue your own MLOps journey. Over to you, Deb.

Deb Lee: Thanks, Al. So, what are these key takeaways and learnings that we think you should consider as you pursue an MLOps journey? To keep it simple, we’re going to discuss three key pillars: people, processes and tools. Underpinning these areas, you have your organization’s data, internal and external data sources, structured and unstructured data sets, with the potential for massive data volumes. Any MLOps initiative should prioritize data quality and ensure that data can be consumed and refreshed in an automated fashion.
In terms of people, we will address the skills needed and how teams can be structured to deliver on the promise of MLOps. In terms of processes, we will share what we think are helpful artifacts to produce to guide future work in this space. Finally, in terms of tools, we’ll consider how tool selection is made but also why Databricks is a particularly compelling option.
Only a small fraction of real world operationalized ML systems is composed of the ML code, as shown by the small red box in the middle. The required surrounding infrastructure needs are vast and complex and require specialized skill sets in all of these areas to support MLOps engagements. However, while the final amount of model code you end up deploying after many months of experimenting and evaluating may be relatively small compared to the overall amount of code in the system, the time, skills and effort to create it should not be understated nor overlooked. Those lines of code are the result of a wealth of investigation and experimentation. The surrounding components are all stable, robust and proven, and are becoming more mainstream. Model quality is still key to delivering value using data science applications with MLOps.
What we’re seeing at our customers right now is that specific consideration and care needs to be given to consider how data scientists operate as the wider data science governance focus shifts to MLOps. Data scientists must be empowered to continue experimenting without feeling hindered by more processes and teams to work with, but similarly, they need to understand how they can benefit from embracing key MLOps practices that have not traditionally been a part of the typical data scientist toolkit. It would be largely unrealistic to think the data scientists will take on many skill sets that fall outside of those needed for experimenting to build models that prioritize production quality, as these are already a highly specialized set of skills spanning mathematics, statistics and computer science.
However, operationalization can be streamlined if data scientists are guided to focus on enabling scalability. So, let’s say modularizing Python functions and classes so that they can be easily called by a grouped [inaudible] user defined function. Efficiency. So, adding unit testing and avoiding hard codings as you develop, and reproducibility, so using environment scripts and cluster scripts, tracking your experiments and registering models within a code management system with branching. What we have seen is that if data scientists can focus on these elements in their code as they start experimentation, the transition to productionized models will be significantly less painful and will set them up to continue experimenting in a production setting more easily.
Thinking more widely than just the data scientists, there are a number of other types of capabilities and skills needed to successfully deliver and support production ML systems. In our experience, the following capabilities are needed. Machine learning engineering, experience building and testing end to end data science and data engineering with a strong background in CI/CD and systems delivery. So, defining the guidelines for the code-based delivery strategy and speeding up that development life cycle. Data science. The ability to take prepared data and use a variety of methods to extract insights. Data scientists usually have a strong background in disciplines like math, statistics and computer science. They’re often tasked with building machine learning models, testing those models, and keeping track of their experiments.
Data engineering. Develop, construct, test and maintain data pipelines, which are the mechanisms that allow data to move between systems or people. Take the data from its raw source and move it along the pipeline to where it can be used at different stages of a data science project. Typically, we see a strong background in CI/CD and systems delivery and comfort working in code based and [inaudible] tools. Program management. Key to coordination across experimentation, operationalization and business teams to manage the engagement, manage and coordinate project delivery and the solution backlog, take responsibility for issue resolution and risk management, change management, project priorities, status communications and meetings.
Solution architect. Someone with a deep understanding and technical knowledge across the end to end solution to fortify the delivery, validate checkpoints in the flow, and sign off on overall design and architectures. Data visualization. Design, develop, test, and maintain front end screens for monitoring and model consumption for different target audiences. We typically see a strong background in UI/UX design principles and semantic layer design for reporting across most common front end tools.
So, while a wide variety of skills and experience are needed across any team orchestrating the development, deployment and maintenance of ML models in a scalable and standardized way, we found that the blend needed will largely depend on the nature of the engagement. So, let’s say you’re operationalizing a POC that requires real time model serving with the web ep component and you need to create all of the surrounding pipelines and infrastructure from scratch. Or, let’s consider a scenario where a model has been deployed and you’re looking to improve the monitoring capabilities to stay ahead of potential drift and model decay. In this scenario, the skills needed to deliver may be quite different from scenario one.
So, while these six core capabilities will be required for any MLOps project, it’s also important to recognize that the relative blend will depend largely on the nature of the engagement. Coverage of the individual expertise areas is obviously vital, but we find it very powerful to have individual team members who possess a combination of these skills. The greater the overlap and empathy between roles, the smaller the team can be and the easier communication is. With a good degree of skills overlap, projects can move faster and with greater effectiveness.
Next, I’m going to speak to the processes that we have implemented with our customers and how these have helped them transition from ad hoc experimentation environments to robust and automated operationalization and ongoing monitoring of models. There are a number of moving parts needed for MLOps to be successful. The use of frameworks and defining processes are what enable success. And, while it may seem straightforward to lay out a strategy, getting everything to work is fairly process-intensive. Where we have seen the greatest success in bringing MLOps frameworks to life is through the creation and use of artifacts that provide governance, checkpoints and guidance that can then be tested alongside technical implementations.
First, we’ve created questionnaires that are used to qualify use cases and projects and pipeline for onboarding to the MLOps framework. This standardized template is shared with POC and experimentation teams to assess the suitability of the model for productionization, considering the business value and demand, the technical framework, and the product road map. Operationalization requires resources and hence, an investment. So, qualifying which model should be onboarded to production operations early on ensures resources are being used efficiently.
The ML test score measures the overall readiness of the ML system for production. The final ML test score is computed as follows. For each test, half a point is awarded for executing the test manually, with the results documented and distributed. A full point is awarded if there is a system in place to run that test automatically on a repeated basis. You then sum the score of each of the four sections individually under data tests, model tests, ML infrastructure tests, and monitoring. And then, the final ML test score is computed by taking the minimum of the scores across each of those four sections.
I’ll speak a bit more to this in a bit, but for anyone embarking on a data science project, we’ve created guides on tools to use that consider the technology road map, the model training volumes, libraries used, the serving method. So, is it batch or real time? The parallelization and retraining frequency. For every decision point in the data science life cycle for tool selection… So, data ingestion, data engineering, model training, validation and experiment tracking, model serving, model monitoring and job orchestration, we’ve laid out the important considerations to keep in mind when selecting the tool you use for each step along the way.
We’ve also created general playbooks that contain guidelines for experimentation and operationalization to streamline the MLOps process. While this may seem fundamental, we’ve found that establishing these guidelines serve as a foundational component around change management for data science teams to level set on MLOps standards and key considerations.
Finally, a major challenge that we have seen across all of our customers and data applications generally is around end to end data governance and reproducibility. To address these challenges, anyone looking to get started with MLOps should strongly consider implementing the use of a reproducibility checklist for all products. This checklist requires code versioning, data versioning, model versioning and a model registry, cluster configuration, and environment specification.
The final but equally important aspect of an MLOps framework to consider are the tools used. As I mentioned on the last slide, we have used decision trees to provide a simple guide of which tools to use along various decision points for data scientists and ML engineers. The decision on which tool to use can be taken anywhere in the ML project life cycle, but often, it’s most impactful and important to consider the best tool for the use case when you’re just starting out or when you’re experimenting. This may necessitate that data scientists think more broadly than the immediate task at hand, but consider how the model may need to scale to accommodate other geographies, markets, functions, channels, products.
Another obvious decision point for either switching to a different tool or service or maintaining the use of what has been used to date is when the initial industrialization is taking place. However, you’re not precluded from switching ML service at some point in the future. When deciding on which tool or tools to use, it’s important to consider the various aspects of the project. So, model training and evaluation, orchestration, deployment and tracking. And, the framework and artifacts that we have produced for our customer to define the MLOps framework… We constructed decision trees for each of these key aspects for our project to assist decision making for teams in the future moving forward.
Databricks is a best in class tool for data science and one we think is invaluable to a cloud-based data architecture. I’m now going to hand it over to Al as a Thorogood resident Databricks champion to explain why we think Databricks is optimally positioned to help deliver on the promise of MLOps.

Al McEwan: Thanks, Deb. So, why is Databricks so aligned with the [inaudible] MLOps? It boils down to several key points. Firstly and fundamentally, Databricks is Spark plus. Spark is a powerful engine for large data processing and Databricks makes this readily available on the cloud platform with many of the necessary capabilities pre-configured. Databricks offers value from both the data engineering and the data science perspective, creating what Databricks refers to as a unified analytics platform.
Databricks is a best in class data science tool, meaning that it is widely used for data science experimentation and thus, as we’ve already discussed, a natural platform to industrialize, since that’s where the code already exists. Databricks also integrates an offering called ML Flow, which provides a strong platform to manage the ML life cycle, and that’s critical for MLOps.
Finally, Databricks isn’t just available on one major cloud platform. It’s now available across the major cloud platform vendors of Microsoft Azure, [inaudible] Web Services and Google Cloud platform. This means that regardless of which cloud you use, you can tap into the power of Databricks. It also makes your code investments in Databricks relatively portable, enabling an organization a multi cloud strategy, since it’s easy to deploy the same notebooks onto another cloud platform as necessary.
And, lastly but not least, Databricks is integrated with prominent serving technologies such as SageMaker and ML Flow, and reporting technologies such as Tableau, Power BI, and Segal Analytics. So, it gels well with existing tools and technologies.
I’m going to conclude this presentation with a few suggestions on how to get started with ML solutions. Firstly, some fairly obvious ones. Take a look at where you are and where you want to get to. What do you want to achieve? What are the business goals? Also, consider how you’d expect ML solutions to fit into your existing business processes. The people, tools and processes and how they mesh together are a key part of how to achieve those goals. Thorogood supports organizations to orient themselves into delivering ML solutions and we can work with you within the context of your organization. Please feel free to review the materials available on our website and contact us to see how we can help. Thank you for listening.

Al McEwan

Al McEwan is a principal consultant at Thorogood Associates, a Databricks partner since 2018. He is also a Databricks Champion. Al has been heavily involved with our Databricks partnership since its i...
Read more

Deb Lee

Deb is a motivated, passionate, and impact-focused senior consultant at Thorogood Associates, a Databricks partner since 2018. She has extensive experience leading end-to-end engagements for data engi...
Read more