MLOps の CI/CD: あらゆるプロセスをセルフサービス化するフレームワークの導入

May 26, 2021 03:15 PM (PT)

Download Slides

How can companies create predictable, repeatable, secure self-service workflows for their Data Science teams? Discover how J. B. Hunt, in collaboration with Artis Consulting, created an MLOps framework using automated conventions and well-defined environment segmentation. Learn how you can achieve predictable testing, repeatable deployment, and secure self-service Databricks resource management throughout the local/dev/test/prod promotion lifecycle.

The first part of the talk will focus on the core values, concepts, and conventions of the framework. The second part of the talk will include a technical demo of how to implement the self-service automation of Databricks resources and code and jobs deployment into Azure DevOps CI/CD pipelines.

In this session watch:
Cara Phillips, Developer, Artis Consulting
Wesly Clark, Architect , J.B. Hunt

 

Transcript

Wesly Clark: Thank you for joining us. We appreciate your time, and I hope you’ll find our work interesting, informative, and that will help you in establishing your own automated CI/CD pipelines. My name is Wesly Clark, and I’m the chief architect of enterprise analytics and AI at J.B. Hunt. And my colleague, Cara Phillips from Artis Consulting is here with me to present the technical demo.
J.B. Hunt was founded in 1961, and has grown to become one of the largest transportation and logistics companies in North America. We’re currently number 346 on the Fortune 500 and our digital brokerage marketplace, J.B. Hunt 360, has received widespread recognition for innovation and technology. We consider machine learning and advanced analytics as key to our future success, and I’m honored to be involved in establishing these disciplines at J.B. Hunt. Artis consulting was founded in 2002, and they focus on four pillars: data and analytics, AI and machine learning, the internet of things, and intelligent applications. They’ve been a key partner in helping turn our vision into a robust production process. I want to begin by orienting you to the role analytics plays in creating business value, highlighting which parts of the MLOps life cycle this framework focuses on. And discussing the guiding principles we adhere to when implementing our solution, then we’ll get to the part you really came for the technical demo and the practical steps you can take to create your own solution.
We want to emphasize that everything we do in analytics, data science, and machine learning should focus on creating business value and accomplishing the objectives of our organizations. As scientists, we could easily lose ourselves in the numbers. So, we intentionally refocus ourselves on the people we’re trying to empower, and the processes in which our solutions will be embedded. I’ve watched quite a few fantastic talks about data engineering, hydrating the Delta Lake and creating feature stores. Likewise, there are plenty of presentations that focus on containerization, serving, and production performance monitoring. Today, I want to focus on a secure self service framework for automating the creation and deployment of compute environments, linked to specific project branches of a products code repository. Before I show you the framework architecture and implementation, I want to speak briefly to the guiding principles on which our solution was established.
We were aiming for predictability. So, we chose convention over configuration. We wanted it to be automated because the real power of these conventions is realized when the user doesn’t have to remember the rules to see it in action or to implement it. We wanted to strike a balance between making it secure and self-service. So, we sought to empower our users while simultaneously providing boundaries to keep them safe. We wanted this framework to emphasize repeatability by creating configuration artifacts that follow the code through the entire ML lifecycle. We ensure repeatable deployments and environment creation. We wanted to introduce clearly defined environments. We weren’t just seeking to automate the workflow we already had, but to create new possibilities through the tools we were giving to our teams. We introduced our analyst, engineers, and scientists. So, some of the most robust concepts from enterprise scale, software engineering, development life cycles. We want it to be platform and cloud agnostic.
We weren’t multi-cloud when we started this, but Databricks was. So, it was important to us to choose solutions that could run anywhere. I don’t have time to cover all the decisions we made in detail, but I want to give you a high level understanding of the framework. The first thing I want to draw your attention to is a set of config files that are stored in the user’s code branch. The environment config file is where you would store values that change based on which environment your code is running in. Next, the cluster in library config lets you define the dependencies, and compute resources needed for your code to run. It also lets you specify who should have access to your project. Lastly, the jobs config is where you would store the instructions for how your code should be deployed as it moves towards production.
Next, I want to explicitly define what it means for your code to be deployed to specific environment. It means that a fresh copy of her code has been pulled into the Databricks project folder and is synced with your pository branch. It also means that your code runs on a dedicated cluster, meeting a specific requirement defined in your branch’s config files. Lastly, it means that the cluster is operating under the authority of a surface principal that only has access to the appropriate resources in the corresponding infrastructure environment: local, dev, test, and prod. That means you’re using different secrets, storing your data in files in different containers, and accessing different versions of the web services depending on which environment you’re running in. Now, let’s talk a little bit more about how the CI/CD pipeline works alongside your product code. We’ll walk through a self-service cluster management scenario.
Let’s imagine that a user with an existing repository for their product code was going to start using the CICT framework for the first time. Thankfully, the CICT repository has a set-up pipeline to help get them started. Step one illustrates the first phase of the set-up pipeline, which will transfer the config files. We just talked about on the previous diagram into the user’s product repo. Step two shows that the second phase of the set of pipeline will transfer a few YAML files, and create child pipelines in the product repo. One of these child pipelines is used to initiate job deployments to new environments. And the other pipeline, which is relevant to this scenario, listens for committed changes made to the config files. Once the initial set of pipeline has finished transferring files, and creating child pipelines in the product or repository. Let’s imagine that in step three of this diagram, the user modifies the cluster in library config to change the maximum number of nodes for that cluster.
Add a third party library to their list of dependencies and modify the access control list to add a teammate to their project. In step four, after the user commits the change to the cluster and library config file, the listener pipeline would trigger and initiate a callback to the CI/CD repository. Step five emphasizes that the majority of the functionality lives in the CI/CD repository where a YAML pipeline would execute a series of PowerShell scripts that would validate the config file changes and convert it into three separate JSON files, which are then sent separately to the Databricks cluster, library, and permissions APIs. Finally, in step six, you see that after a few moments, these changes would be completed by the Databricks APIs, and the user would see the updated cluster in the Databricks UI with the new library loaded onto it accessible only by the team members specified in the access control list.
All right, one last diagram, before we get to the technical demo, I want to reemphasize the kind of workflow this enables; continuously creating enhancements to your product without environment conflicts. In this image, you see two separate branches of the same product being worked on by two different users in the local environment. Each user has a separate copy of the product notebooks and config files stored in their local projects folder under their username. Each runs on a dedicated cluster set up according to the user’s config. Each cluster could have completely different properties and run different versions of libraries without conflict. Once you get past the local development, only one branch of a product can be deployed at any given time. Over in the dev environment, an older version of the product already committed to master is on its way to being productionalized. It’s already past the first quality gate enforced by our CI/CD job deployment pipeline.
It also has a separate copy of the notebooks and can hit fig files stored in the dev projects folder. It is using an older copy of the library, and runs on a dedicated job cluster meeting the specifications of an older version of the config files. Unfortunately, that code failed to pass the stricter quality gate to get into the more closely guarded testing environment. So, both test and prod are still retraining using code from a previous master release tag. Both environments have a separate copy of the notebook and config files stored in the test and prod projects folders, and run on dedicated job clusters, still using even older version of the library.
In all environments: local, dev, test, and prod, each cluster acts under the authority of an appropriate environment service principal. To interact with environment specific instances of tables, file storage, external web hook, and service integrations, and to publish job events. These environment specific service principles also have been granted permissions to invoke those deployed jobs so that they can be initiated from outside of the Databricks using the surface presentable credentials. I know that was a lot of conceptual material to cover. Thank you for bearing with me through all of the abstractions and theory behind the framework. Now I’m going to turn it over to my friend, Cara, to show you how you can implement something like this one step at a time.

Cara Phillips: Thanks Wesly. Hi everyone. My name is Cara Phillips and I’m a data science in MLops consultant at Artis consulting. As Wesly just mentioned, I’m going to show you these pipelines in action. So, let’s start by taking a look at the file structure in the CI/CD repo. First thing to notice are these YAML files, which are the templates that contain the steps that our pipeline will execute. Right above these files, we have a scripts folder. These are all of the scripts that are going to be run in each step of the pipeline. We decided to use PowerShell scripts, but you can use any language that can parse JSON files, and send data to the Databricks API. I want to pause here for a bit, and discuss the logic behind how he designed some of these scripts. So, the goal behind developing this code was to make as much of it reusable as possible.
So, with that goal in mind, we designed these scripts to be modular and functionalized so that we could reuse the same scripts in different kinds of pipelines. One of the keys to accomplishing this was to create an environment variable for each pipeline called pipeline type. This environment variable, which you will see when we look at the YAML files in a second indicates whether the pipeline being run is the cluster pipeline or the jobs pipeline. This variable indicates which modules or functions within each of the PowerShell scripts will run. Most of the code is common between both pipelines, but there are other modules of code that need to be run, and certain environment variables that need to be set based on which pipeline is running. And we use the pipeline type variable to direct that process. One example of the different code modules is in the cluster configuration generation process that sets the cluster configuration for clusters and jobs. The logic behind how that process works, and what the final result looks like is different for the cluster pipeline than for the jobs pipeline.
So, the pipeline type variable tells the pipeline which of these two modules to run. At the top here, we have two folders, one for data science and one for data engineering. Both the data science and data engineering teams are using these pipelines. And these folders contain configuration default values that can be set at the team level. For example, the data science team can use these files to set default values for their cluster or jobs configurations to set default permissions or to set libraries that will always be installed on their clusters. Let’s take a closer look at the templates in the data engineering folder. The first one we’ll look at is the cluster config template. As you can see here, this JSON contains all the elements in the correct format required by the cluster’s API. Some of the elements have values already populated, which cannot be changed by the users.
Later, we’ll see how each of the users sets values in their cluster in library config JSON file which the pipeline uses to update the null that you’ve seen in the template here. Some of these null values are set by the pipeline, according to the organizational conventions and are not dependent on user inputs. Because the pipeline can automatically set these values, the users are not required to memorize any naming conventions or specific file paths. One example of a value set by the pipeline is the cluster name. Instead of users providing a name for their cluster, according to some kind of naming convention, the pipeline uses the environment, repo name, and branch name to construct the cluster name automatically. The next template is the libraries template. Here the team specifies any libraries that need to be installed on every cluster created in the workspace. This template replicates the install automatically on all clusters functionality in Databricks, which as at the time of this talk is not supported for run times greater than 7.0.
Next, we’ll look at the jobs config template. Like the cluster config template, this JSON has all of the elements required to create and edit jobs through the API. The pipeline will use the values provided by the user in their jobs config JSON file to update some of the null values here. The remaining null values will be set by the pipeline. Again, users aren’t required to memorize any naming conventions and the pipeline will automatically construct the required values for them. The last template here is the access control list template. Here the team can set default permissions for jobs in clusters. Later, we’ll see how additional cluster permissions will be set in the cluster and libraries config JSON. And this template we are giving can view permissions for all jobs created or edited by the pipeline to the specified ACL groups. Now that we have a better understanding of how the templates work, the last folder to look at here is the files for remote repository folder.
This folder contains all the files we’ll need to copy into the product repo. And once those files are copied over, the pipelines can be built in the product repo. The process of copying these files and creating the pipelines can be done manually whereas Wesly reviewed earlier by another pipeline, we call the setup pipeline. So, let’s take a look at the setup pipeline. The first part of our YAML file are the parameters. The first parameter called remote repo name tells the pipeline which repo to copy files into or create pipelines in.
The next two parameters indicate which steps of the pipeline should be run, so you can have the pipeline only copy the config files, or only create the pipelines, or both. For the copy file step, we use an Azure CLI step to copy files between the CI/CD repo and the product repo. The file copy does not overwrite any config files that are already in the repo. Notice the condition here; this step will only execute if the copy files parameter is set to the value copy files. The last step is the create pipeline step. Again, this step only executes if the create pipelines parameter is set to the value create pipelines. This step uses the AZ pipelines create command in the Azure CLI to create the pipelines in the product repos. Let me show you what the parameters look like when you go to run the pipeline.
The first parameter you see here is the remote repo name. This is a free text field where you can enter the name of the product. The next two parameters are radio buttons you can select to set which steps in the pipelines will run. I’ll go ahead and select both steps and execute and run the pipeline now. Let’s go over to the product repo, and take a closer look at the file structure.
The first folder we have is the notebooks. This folder contains all of the notebooks that are linked to the date of birth’s workspace. The remainder of the files are the ones that were copied from the CI/CD repo. First, we have the pipelines definitions folder, and this folder contains YAML files that will trigger the execution of the pipeline steps stored in the CI/CD repo. The last files we’re going to look at are the cluster and libraries config JSON file as well as the jobs config JSON file. The user will use these two files to create and manage the configuration for their clusters and jobs. So, the first step the user needs to do once the repo’s set up is to create a new user branch from Master.
Once they’ve created their user branch, they can begin editing and config files. So, let us take a closer look at the cluster and libraries config file. The first parameter they’re going to configure is the workspace name. This value will determine which folder in the CI/CD repo our default config values will come from, as well as which Databricks workspace the pipeline would be authenticating to. The cluster parameters here contains a subset of parameters that the user can set for their cluster configuration. Structure of most of these values is directly equivalent to the required JSON structured for the API, with the exception of spark version. Here, we created a structure where the user only needs to specify the version number, whether or not they want an ML runtime, which contains many machine learning packages by default, and whether or not they need GPU compute on their cluster. The pipeline maps these values to the correct key that’s required by the API.
The last set of parameters in this section I want to call attention to is the custom tags. These tags are very important for tracking and managing your database spend. So, in addition to the J.B. Hunt specific tags you see here, pipeline automatically sets tags for the environment, which is either dev, test, or prod, as well as the repo in branch that triggered the creation of the cluster. The next section is the access control list. Here the user list the emails for everyone working on their code in the branch. The pipeline will use this list of emails to set the permissions on the cluster. And this ensures that the created clusters reserved exclusively for the team members working on that branch. Last section in this file is for the libraries. For each library type, the user specifies a list of libraries they will have installed on their cluster. Or Python and crammed packages if their required package repo is different than the default repo, they can specify that using the package and repo structure here.
Now, let’s make a couple changes here, save file and watch the cluster we’ve created. We’re going to change the max workers to seven, and save the file back. Now, let’s go take a look at the pipeline then. So you see here that the pipeline was triggered when I saved the file, and it is automatically running. While we wait for that to finish, let us talk a bit about what’s going on behind the scenes. So, back in the product repo, there’s a file called pipeline cluster, and this file listens for changes to the cluster and libraries config JSON, and calls back to the CI/CD repo, execute the pipeline steps that are stored there. So, let’s go over to the CI/CD repo now, and look at those steps. At the bottom of the repo here, we have a YAML file called pipeline cluster config. This file contains all the steps that are going to create and manage our Databrick’s clusters.
The first couple of steps here do some administrative work to set up the pipeline environment. And then we get to this generate cluster config step. The step parses the values provided by the user in their config file, combines it with the cluster config default template, and creates the JSON to send to the clusters API in the next step. The creator edit cluster step uses that first JSON file to either create a new cluster if one doesn’t already exist, or to edit an already existing cluster. So, once our cluster is created, all we have to do is add or update the permissions. It’s that cluster permission step takes the list of users from that big file, parses them into the proper JSON structure and sets the permissions on the cluster using the permissions API.
Once the permissions are set, we now have to install the libraries. The first requested library step takes the libraries in the config file and parses them. And in the next step takes that parsed JSON and installs any new libraries, and uninstalls any libraries that were removed from the config file. Now, let us take a look at our completed pipeline. Once we go into the run, you can see that all of the steps completed successfully. That means we should see a new cluster running in Databricks with the config specified in the config file. So, let’s go over to Databricks and look at the cluster.
As you can see here, we now have a cluster running in Databricks. The name for this cluster was defined in the pipeline by the organizational naming convention. As you can see the configuration here is exactly what we had specified in our config file. Likewise, the user specified in our access control list are listed here under the permissions. And lastly, when we go into our libraries, we see that the libraries we requested have been installed in addition to a couple of libraries that were specified in the default libraries file. So, wrapping it all up step one is to copy the pipeline in the config files into the product repo and create those pipelines. And then once that’s complete, the user fills out their cluster and libraries config file and saves it. The pipeline then automatically is going to run, and the user has their own cluster to use within minutes.
The next pipeline we’re going to look at is the jobs deployment pipeline. This pipeline provides automated and secure jobs deployment. And so let’s start in our product repo with the user’s workflow. When the user is ready for their code and jobs to be deployed in the dev environment, they will come to the jobs config JSON file. Here, they will specify how many jobs will be created in the configuration for each. First parameter they’ll have to fill out is the name of the notebook. Next, they will specify if they want to use a high concurrency cluster or a jobs cluster to execute their job. So, using a high concurrency cluster, instead of a jobs cluster, will reduce the cluster start-up latency when jobs deploy, and allow jobs to be run in parallel. Next, they will specify the cluster configuration and a set of libraries.
If they’re using a high concurrency cluster as in the first example here, cluster config and libraries are set based on the configuration in the cluster and libraries config JSON. So, the values and the job config for these parameters will be null. If the job is going to be run on a jobs cluster like in the second and third examples here, there are several options for these two parameters. If the user specifies default for either the new cluster or libraries parameters, they can fit for that respected parameter will be taken from the cluster and libraries config JSON. Additionally, the user can specify none for the libraries parameter, if no libraries are required. In the second job you can see, the default config is specified for the cluster configuration, and the libraries Perimeter is set to none indicating no libraries are required for this jobs cluster. The last option for these two parameters is to specify a new cluster config or set of libraries, I’ve shown here in the third job.
Configs are specified using the same structure as is used in the cluster and libraries config. So, we have the cluster configuration here and below the libraries configuration. Once we’ve saved our changes back to our branch, and committed all our code to Master, we can run the jobs deployment pipeline. Let us do that now. Currently this is a manual process, so we’ll go into our pipelines in dev ops to trigger the pipeline. Notice the cleanup branches after prod deployment radio button, if yes is selected after the jobs have been deployed to the prod environment, the pipeline will go back and clean up any clusters or feature branches that are no longer needed. I’m only going to run the dev stage for now.
While this runs let’s take a closer look at what the pipeline is doing. Back in the product repo, the pipeline jobs YAML file invokes the pipeline steps from the pipeline jobs config YAML file stored in the CI/CD repo. So, let us go over to the CI/CD repo to see what those steps are doing. There are three different environments the jobs will be deployed to: dev, test, and prod. The same general steps will be repeated for each environment. So we use a build steps parameter, which allows us to only have to write the code for the steps once, and those steps will be repeated for each environment. You will notice many of the steps for the same as what you saw in the cluster pipeline since much of the code can be shared between the two pipelines. So, let us take a look at the steps.
So like what the cluster pipeline, the first couple of steps do some administrative work to set up the pipeline. The next step is to deploy the code to the environment before the jobs are created. But this part we’ll be using one of the latest features from date of birth is called Repos. So, Repos provides a much more robust get integration. Previously notebooks were synced to get Repos individually through the workspaces feature. With Repos, you can link entire branches at once.
Additionally, with the introduction of the files feature, you will be able to see an edit, not only your notebooks, but other files like JSON and YAML files directly within Databricks. This means that users won’t even need to leave Databricks to edit their config files, which makes their workflow even simpler and more streamlined. When using the new Repos feature, we’ve had to add a new piece to our architecture. The Repos fetch and checkout API requires an Azure active directory access token to execute the necessary get actions and cannot use the service principal PAT we’ve been using in the pipeline so far for authentication. So, we had to find a way to generate an Azure access token, which requires the username and password from an Azure as active directory user. So we had to create an Azure process ID or a CID, and add it as an admin user to the Databricks workspace. I want to take a couple of minutes here to walk you through the PowerShell script that generates that access token and uses it to call the fetch and checkout API.
So, the first lines here make the secret from the variable group that are specified in the YAML available to be used in the script. Next, we have a link to the documentation for the process, which includes some Python examples. The branch we are going to be deploying our code from is the branch that triggered the jobs pipeline. So in our case, we will use a master as our deployment branch. The product repo name is the name of our repo, where our code to be deployed lifts and the runtime environment corresponds to which stage we are. For example, dev test or prod. Next, you will need to modify the token URL by replacing the tenant ID in the other URL with your own tenant ID. Then we can construct the header and body of the API call. We will need to use the Azure active directory username and password to generate the access token. And once we have the access token, we can construct the header embody for the Repos API call.
The Repos file path variable contains the Repos file path to where your code is stored in Databricks. Once we send the Repos body to the Repos API, the API will execute a get full to update your Databricks repo with the latest code from the branch you specified. And that’s how we deploy code using the Repos API. And this script will be available in the supplemental materials as well. So now let’s go back to the YAML file and talk about what happens after the code is deployed. The next step is the generate cluster config step. The script does two things. First, it generates the default cluster config that will be used when the user opts to use a high concurrency cluster, or when they designate default for their jobs cluster definition. The second thing it does is to parse a new jobs cluster configuration when provided by the user.
So, once those configs are parsed the create or edit high concurrency cluster step we’ll check to see if a high concurrency cluster is required for the set of jobs being deployed and create or update cluster, if needed. If a high concurrency cluster is required, the next step is going to set the proper permissions for that cluster. This includes adding permissions for dev, test, or prod service principles, which are going to allow the cluster to be managed by other applications like Azure data factory or Airflow. The next step is to parse the libraries that are going to be installed on the cluster, like with the cluster config parsing step. This step is going to parse both the default set of libraries and any custom set specified by the user. So, after we have all those conflicts ready, we can deploy our new jobs or edit our existing jobs in the next step.
Finally, once our jobs have been deployed, we need to set update permissions. So you can define permissions for each custom set of users or groups in the default values for jobs stored in the teams folder up in CI/CD repo. In addition to these permissions, the dev tests and prod service principles will be given access so they can orchestrate these jobs from external applications. Once we have all of these steps defined, we can execute them in each of the stages below which includes security and quality gates in between each environment deployment. I want to point out a step here specific to prod stage called branch cleanup. The script within this task only runs if the branch cleanup parameter was set to yes, when the pipeline was run. The default behavior is to execute the branch cleanup step, but there are instances where users might need to keep their branch and cluster for additional development.
So, we added this option to increase the flexibility of the pipeline to handle non-standard use cases. This branch cleanup step has two main functions. The first is to delete feature branches that have already been merged to Master, but not deleted during the merge process. The second function is to delete any Databricks clusters that are associated with branches that have already been deleted. Remember, our cluster pipeline created one cluster for each feature branch. So, once those branches are merged to master and deleted, the cluster associated with that feature branch is no longer needed. There are a couple of reasons this cleanup step is important to our overall environment. First, it makes it easier to track work in progress. If feature branches are deleted once deployment is complete, then only branches that have active development will exist in the repo. Another reason is that it allows more accurate cost tracking.
If a user’s cluster and feature branch are deleted when development of that feature is complete and merged to master, they cannot use those resources for development work on other features or projects. There may also be cost savings as well if the clusters are allowed to run, even after development is complete. The cleanup step would delete those clusters, saving the organization some money. The last reason this is important is that it encourages thoughtful branching practices. If a user’s branch is deleted after their code is merged, users will be encouraged to be specific about what feature is being developed by each branch, and to use a feature branch to only develop one feature at a time. This keeps the environment clean and organized, and also helps track work in progress at a more granular level. So, now let’s take a look at the jobs pipeline we started earlier.We Go into the run, We see that all the steps with the dev stage completed successfully, which means we’ll have three new jobs created in our Databricks workspace. So, let’s take a look at those.
You’ll see we have three jobs created for our dev environment since we configured three jobs in our jobs config file. Notice the cluster definitions for each job here. First job is using a high concurrency cluster, and the second job is using our default cluster config with one to eight workers, and third is using a custom cluster config one to four workers. So, let’s go into the second job here and take a look at the configuration. In the configuration tab, you’ll see the specs for the jobs cluster that was created for this run. And then below in the advanced section, we can see pipeline gave the correct permissions for the two service principals, the admin and dev, as well as three ACL groups we specified in the default access control template in our teams.
So, to summarize the jobs pipeline, first step is for the user to fill out their jobs config JSON and their user branch. Next, they merge all their codes in Master, and then once the code is ready to go, the jobs pipeline can be run and the job will be created and run. So, today we created two pipelines to automatically create and manage our Databricks clusters in jobs, but then easy self-service workflow for these. So thanks everyone for listening to my tech demo. I hope you learned some interesting and useful things, and I’m going to pass it back to Wesly to wrap everything up.

Wesly Clark: Thank you, Cara, for giving such a thoughtful technical demo and for all the hard work you’ve put into bringing the vision of this framework into reality. Now I’d like to briefly recap what we discussed today. We told you how J.B. Hunt was trying to accomplish as an organization, and how our team has focused its work to ensure we are contributing to the realization of that mission. We drew your focus to the section of the MLops life cycle, where our framework operates and where we feel the most progress of iterative improvement occurs. We described the motivating principles that guided us while we implemented our solution. We visually represented the framework with a series of architectural diagrams to help you see the big picture. And then Cara gave you a closer look through a step-by-step technical demo. And finally, the only thing left to say is, thank you for investing your time with us. I hope this was helpful and additional resources and details will be available to download after the presentation to help get you started. Thank you.

Cara Phillips

Cara is a Data Science and MLOps consultant at Artis Consulting. She focuses on developing MLOps strategies and the technical implementation of those strategies in automated CI/CD workflows. Her most ...
Read more

Wesly Clark

Wesly Clark has degrees in Theoretical Mathematics, Computer Science, and a Masters in the Management of Information Systems. Over the course of his career at J.B. Hunt he has focused on establishing ...
Read more