How to Build a ML Platform Efficiently Using Open-Source

May 28, 2021 10:30 AM (PT)

Download Slides

Fast-growing startups usually face a common set of challenges when employing machine learning. Data scientists are expected to work on new products and develop new models as well as iterate on existing ones. Once in production, models should be continuously monitored and regularly maintained as the infrastructure evolves. Before too long, data scientists end up spending most of their time doing maintenance and firefighting of existing models instead of creating new ones.

At GetYourGuide, we faced these challenges and decided to think about machine learning development holistically, which led us to our machine learning platform. Our platform uses MLflow to keep track of our machine learning life-cycle and ease the development experience. To integrate our models into our production environment, we also need to deal with additional requirements like API specification, SLOs and monitoring. To empower our data scientists, we have built a templating system that takes care of the heavy lifting of going to production, leveraging software engineering tools and ML-specific ones like BentoML.
In this talk we will present:
– Our previous approaches for deploying models and their tradeoffs
– Our data science and platform principles
– The main functionalities of our platform
– A live demo to create a new service
– Our learnings in the process

In this session watch:
Jean Machado, Senior Software Engineer, GetYourGuide
Theodore Meynard, Developer, GetYourGuide

 

Transcript

Jean Machado: Hello, everybody. Welcome to our deep dive in how we build GetYourGuide’s machine-learning platform using open source. We are honored with your presence today. My name is Jean Machado. Theo and I will introduce ourselves, and we’ll follow with the agenda for this presentation. Next slide, please.
So, my name is Jean Carlo Machado, I’m a Senior Software Engineering in GetYourGuide. I live in Berlin, but I’m Brazilian. And in GetYourGuide I’m working with searching, search recommendation systems, and the machine-learning platform. I’m also excited to see the MLOps field evolve, and what machine learning and automation can do for society. My biggest hobby, by the way, is programming.

Theodore Meynar…: Hi. I’m Theo. I’m working as Senior Data Scientist and I’m working on the recommender system and search ranking and also the ML platform. I’m also involved in PyData Berlin where I help organize monthly meetups. And if I not doing all of that, I loves to ride my bike looking for the best bakery patisserie in town. So, yes, you might have guessed, I am French. So now let’s go back to Jean who will present the agenda.

Jean Machado: So, the agenda for today is the following; we’re going to do a brief introduction about GetYourGuide and then talk about machine learning in GetYourGuide, our journey through machine learning, and our levels of operation maturity we’ve been through. We will then look at our platform and how it solved the problems we are facing with operations, and the main components of our platform. And then, to get very hands-on, we will walk you through a demo on how we do automatic training with live inference. And finally, we’re going to close with some learnings and final remarks.
Let’s then deep dive into it, and go through our introduction. So, to talk about machine-learning platforming in GetYourGuide you have to talk about GetYourGuide. Next slide, please. So, for those of you who don’t know, in GetYourGuide we’ve built the world’s largest marketplace for traveler experiences. We are connecting millions of customers to over 40,000 experiences worldwide. To summarize our mission is to give the world access to incredible experiences.
So, do you want to skydive on the Sugarloaf in Rio, or see the sunset from the top of the Empire State Building, or then dive into the Great Barrier Reef in Australia? With GetYourGuide, these experiences are just a few clicks away. The epidemic hit us hard as well as the entire travel sector. Nevertheless, we took this time to invest heavily in our bases, in our technology and one of such efforts is our machine-learning platform. Now Theo will talk about how we do machine learning in GetYourGuide.

Theodore Meynar…: Yes. So, after this short introduction, let’s dive into our machine-learning use cases. Here, I will show you some of the data product we built. So, the first one is about the Ranking Service; so, we use machine learning to rank the activities. Indeed, we want to propose the most relevant activities from our inventory on the different landing pages and search result pages. We also use machine learning to recommend personalized activities to our customers in the different web components. So, here you can see a screenshot of my recommendations in Berlin, so you might have guessed that I’m often traveling with my family.
We also use machine learning to help tourists to discover a destination and to find the best activity on GetYourGuide with Paid Search and in collaboration with Google. But also, we use machine learning to forecast the future demand, and this is quite useful for planning and also for pre-buy tickets. We use machine learning to automate the labeling of our inventory. So, we have more than 20 projects distributed in two teams. In addition, we have some model that we deliver to other teams. So, as you can imagine, we start to reach a critical size.
And to stay aligned in different projects and teams, we agreed on a set of principles to guide our works. We split them in six different pillars. The first one is strategy, and this is how do we prioritize our work? And for example, we favor business value over state-of-the-art models. The second one is about the workflow, so how do we plan our work? And so, for example, we incorporate data to inform our planning. The third one is about the model, so how do we build the models? And for example, we value small iteration of existing model and performance is only proven online with A/B test.
Next one is about QA and monitoring, so how do we assure the quality of our models? And here, for example, we aim to have solid and resilient deployments. Fifth one is about engineering, so how do we build our infrastructure? And here, we want to make sure our work is reproducible and modular. And finally, last but not least, stakeholder engagement, how do we engage with partners? And here, we want to promote and educate our partner about the data product mindsets.
So, in previous retrospective, gathering all the data products team member, we agreed that we were not to up to our own standards in two of these dimensions, especially QA and engineering. At all scale, a broken model will affect the website and ultimately, our customers, so we needed to improve on these dimensions. This led us to the platform we will present you. But before, I will let Jean explain our workflow before the platform.

Jean Machado: So, let’s do a walk through to the way machine learning practices evolve over time in GetYourGuide. Next slide, please. So, until mid-2019, the main way to deliver machine learning predictions was through notebooks. These are Databricks notebooks as we use extensively Spark and Databricks. They are scheduled through Airflow, which is a great open-source tool for composing data pipelines. Airflow then calls Databricks jobs to the training and batch inference.
This approach worked very well for some time as notebooks are perfect for exploration and data science has a much bigger exploration phase than standard software engineering. Notebooks are just the default tool for machine learning practitioners, they offer great visualizations. But over time, we saw that this approach also has its weaknesses. Notebooks are easy to break, they don’t offer proper version control. What if you change a cell by mistake? Well, in our case, changing a cell by mistake can stop ranking relevant products, can show noisy recommendations, or we can pay a lot of money for Google to bet on the wrong ads. Also, it’s very hard to reuse code the notebooks and automatic testing is also not a straightforward practice in this environment. In other words, notebooks are not great to build stable products in increments by teams rather than individuals. So, to be truthful to our principles, we realize we need to do some improvements here and we did a major iteration. Next slide please.
So, our first improvement was to start using version control through Git and GitHub. We then port our code from the notebook into Python scripts and deploy them as libraries inside the Databricks jobs. With libraries, we can easily test, we can reuse, and versionate our code. Thus, the project are much more maintainable. Still, this approach, after a while, also show its weaknesses. Much of the process was still manual, was necessary to change variables for testing and production, which is very error prone, and different projects had different levels of automation. And it was also very hard to replicate the environments of development and production, making the whole topic of machine learning reproducibility very hard to come by and to hand over models between people and make sure that they behave in the same way. So, we saw that even after this iteration, there was still problems and this moment we took a step back and start thinking about machine learning operations more holistically, which led us to think about our platform. Next slide, please.
So, now let’s dive deep into our machine-learning platform. At first, let’s look at the features that our machine-learning platform offers. So, they directly reflect the pains and needs that we had back then. The first feature that we offer is as setup on CI/CD, and that’s because we wanted to automate the process of setting up new models for training and inference, and to keep them running in a healthy state, making good practices of software engineering and quality assurance the default, and also to have a standardized base to build upon. So, we offer CI/CD and we offer also tracking. So, back then we wanted to track our models and their environment to be able to make machine learning reproducible in the spirit of the scientific method, and to learn from the models as they evolve over time. Thus, we introduced MLflow, which is a tool from Databricks responsible for machine learning lifecycle management, which Theo will show you in details how it works afterwards.
And the other key feature was the support, later on, of live inference; so we started supporting only batch inference which is you pre-computes the results, you start it some place for some other system to pick it up. But we then found the need to predict on much more segments and dimensions than previously, so preprocess these kind of predictions. And we also, with online inference, are able to react to real-time inferences. For use cases like fraud detection, this is really necessary and was a feature we added, then, to our platform as well and we’re going to show you how it works.
So now that we took a look on the key features, let’s look at the principles that orient our platform. So, a machine-learning platform can be many things. The machine learning operations field is very new and flourishing, there are many opinionated tools and processes coming up. You can find end-to-end machine-learning platforms, machine learning lifecycle management tools, machine-learning deploy and serving tools, monitoring and model tree tools, feature stores, and so on. So, from the very beginning, we saw that we are in a very large solution space and we need principles to navigate and focus on what matters.
So, the first principle we stick to is ownership, and the main goal of our successful platform is to make data science faster in a sustainable way. So, what a better way to do that than to empower the domain experts end-to-end? The goal is data scientist owns the model from idea to production. Another principle we stick to is to make machine learning reproducible. So, to work as a team and to make progress, we need to be able to review and pick up from others’ job easily, so we wanted to avoid the kind of scenario that, “It works in my machine,” and not in other places.
Another key principle is to build incrementally and to leverage the infrastructure that we already have. So, our technology has a cost; there is adoption cost and there’s maintenance cost. Instead of picking a tool that promises to solve all problems but forces us to redraw our architecture and start the way we work from scratch, we believe in solving specific problems incrementally. And we also have great ops teams in GetYourGuide supporting a lot of excellence CI/CD tools. Instead of picking a tool that does more or less the same, we prefer to adopting the existing tools and to collaborate with them. So, also means that when something goes wrong, everybody can jump on board and help you. The last principle is, as the theme of this conference, the future is open. We embrace open-source and we have a small team and we simply don’t have the capabilities of building a platform from scratch, you have to leverage what the community already built and give it back as well.
So, now that we saw the principles, let’s see how we revised our workflow and how it looks right now. So, the data scientist’s work usually start in the notebook as before. But when we know we want to maintain a project, we migrated to GitHub using a ML service template that we provide. This template is Jinja-based and we use our internal tooling for development enablement to a scope kick death, to set up this template, and apply all the necessary infrastructure changes to make this underlying service working. So, you run our internal tool and it sets up including GitHub, the AWS infrastructure, and so on.
Once you have the GitHub project, every commit you do will follow the Gitflow paradigm, so it will pass through automatic checks on CI/CD provided automatically by the template. Once our quality assurance checks pass, we then run our automatic train in Databricks, and in the end, inference happens. Inference can be either online or in batch; in batch is done on Databricks and online is in Kubernetes. But we saw also that developing in Databricks with Git repos rather than notebooks can be challenging. One can use Databricks Connect but it has high latency to process the data between the cloud and the local machine. One can run the library installing the cluster, but it takes a while to start a cluster, or one can copy and paste the code from a notebook to GitHub and so on and so forth. But that is pretty error prone as well.
So, we found that these approaches are cumbersome and we came up with our own method. So, we created a open-source tool that the data scientists can use, it’s called db-rocket, to speed up the feedback loop during the development. So, this tool is responsible for keeping changes in Python projects in the local machine in sync with notebooks of Databricks as fast as possible. So, using the tool, you can change something in your local copy of the repository and using PyCharm or any other text editor, and by hitting save the changes are immediately available in the specific notebook. The local code goes to the data rather than the data goes through the code like it happens in Databricks Connect. So, now that we look into the workflow from a very high level, Theo will zoom in into its main components, starting by the CI/CD.

Theodore Meynar…: Yes. So, we will have a closer look at the CI/CD steps that a data scientist is responsible for. So, it might seems quite simple; at start, a data scientist has some code and then will need to train. And then, when the train is finished you deploy, and finally, will solve the prediction. However, you might also want to retrain your model regularly on fresh new data, and then follow the same pattern of deployment and serving. But you can also have, for example, some hotfixes that you want to applies to some non-model-related code, but you want to deploy as that. So, for example, you have a API change that you need to do because you forget a special use case or edge case. So, we cannot wait for few hours for the training to be done before releasing the new version, so we will need a shortcut.
And we might also have a new model which were working well on training data, also on the test data but when it’s put online, it’s wrong. And in this case, we want to be able to quickly roll back to an older version. So, what we need is to find a workflow being able to accommodate all this possibility, and that is not that easy. So, with that in mind, let’s zoom in into the different steps.
So, the first one is about the Training Path. So, after a commit from the data scientist, I would like first to validate that the code is correct with some automatic check. So, here, the validation are in place in Woodpecker which is an open-source continuous integration tool, and all the configuration are already in place thanks to the bootstrap step from Git Dev. And so, when all the checks are passed and the test, we can start the next step which is building the training image.
So, here, we can use a Docker image using the base image from Databricks, and we can bring all the dependencies and also the project code together. When the Docker image is build, then you can push it to ECR. And when this is done, we go to the third phase which is the training job. So, here you can do, do the job on Databricks with all the powerful computer and also all the production data, and also using the previously generated image. And the output of all of that is a train model and we save it using MLflow.
So, MLflow is an open-source tool to manage model which was started and now supported by Databricks. And it allows us to automatically store the model that have just been trained, and also the parameters that were used for this model and the metrics from the run. And also, we can have a link to the Docker image with all the dependencies, and so that makes sure our work is reproducible, which is one of our principles. And also, we want to be able to from the top from Airflow, being able to trigger regular retrainings using an existing image, but being able to train on new data. And when this is done, we also want to be able to push a model to MLflow for tracking.
So, now that our model is trained, we should deploy it. And we can deploy it for batch inference or online inference. So, first, we’ll look into the batch inference. And here, again, we have Airflow who will be triggered. So, the training is done and now we can use Airflow to predict on the data set every day or every hour or every week, depending on the use case.
Airflow will regularly trigger an inference job on Databricks. The inference job on itself will load the model from MLflow and the Docker image from Amazon ECR. And with that, we can predict the daily data set, and then when it’s done, save the prediction to S3 to downstream consumers. So, here there is no true serving phase that the data scientists need to take care of in the batch inference; actually, the consumer will take care of the loading of the prediction and use them how they need. So, this close the batch inference use case. So, now we’ll go into the online inference, but first you might ask yourself, “Why?”
So, as Jean explained earlier, it might be smarter for some use cases like for example when the feature are changing very quickly. For example, a user history or also when the number of predictions you need to pre-compute to cover all the cases is too big. And usually, when you go into personalization, you run into this problem. So, to solve this, we need to just predict on the requested inputs and for this, that’s where you need online inference. Which is more complex, but also much more powerful.
So, here the flow for the online inference. Just as a reminder, we have a training job which was done originally triggered from a data scientist commit or an Airflow trigger and the model was stored in MLflow. And now, we can automatically trigger a deployment pipeline after the training. So, first, what will happen is to build a production image. So, we’ll need to download the model from MLflow, and we’ll need to transform this model into a web service able to accept an HTTP request.
To do so, we use BentoML, an open-source project being able to handle that. And so, we’ll create a Docker image with the model and all the necessary infrastructure inside. And once this production image is done, we can push it to ECR. Then we can deploy the model, and we have a service in a Docker image, and actually we’re back to some traditional DevOps; you can leverage all our current infrastructure used by the rest of the tech teams. And here, we use the continuous deployment with Spinnaker, which is another open-source project for continuous deployment.
So, Spinnaker allows us to do a Canary deployment or also rollback or even more complex deployment workflow if needed. And once the deployment is done, we have our model in our Kubernetes cluster where we can have all the monitoring and alerting to make sure that service is healthy and stay healthy. Now, we can zoom into the serving part, and I will show more in detail the web service that we built using BentoML. So, actually, to be able to package a MoLE into web services, there are many options and we compared quite a lot of them.
So, for example, using just MLflow, we looked into using MLflow in combination with SageMaker. We also looked at Seldon, and we also considered doing it ourselves. However, we settled with BentoML framework for the following reason. First, it’s working quite with MLflow models. So actually, we help adding some documentation on BentoML to show how we can take a MLflow model and put it in BentoML. We also have some open API specification, which is an open standard and widely used in our company. So, this allows our engineer to quickly see the API specification of the service in a nice UI.
Also, we can package multiple models together. This is quite useful for A/B test. Indeed, performance is always proven online, and so, we are constantly testing new models. And having two models packed together reduced the complexity of the overall service. Then, we also are able to have quite some customization for the Docker image and the dependencies, which are quite nice, especially for the Datadogs alerting that we’re using, and also the logs that we can send to Kibana.
And finally, last but not least, it’s also easy for pre and post-processing capabilities. The framework make it very easy for a data scientist to add this kind of capabilities, and we need it for some business logic. So now I will present some code snippets, and we’ll show how we can load a model from MLflow to BentoML and also show how easy it is for the pre and post-processing capabilities.
So, imagine that you have an iris classifier prediction which is trying to predict the type of iris, based on some flower measurements. This is a very classic data set of scikit-learn. And we have the model already stored in MLflow, and we want to create online service using BentoML. So, we’ll need two file; one file to be able to load the model from MLflow to BentoML, and a second one to define the BentoML service. So, on the first one, you will need to load the MLflow model, then you can create an object from the BentoML class and pack the model in the class, and finally save it. And that’s it.
On the right, we have the BentoML serve. So, here, we specify where the model will be saved, and then we can create a method for predictions here just a predict. And BentoML will basically transform this method into a route in the service. And so, basically, what the data scientists have to is get a data frame and should return a list of prediction, and for that it can use the model that have been saved into the artifacts subdomain. And that’s it. So, data scientists can easily define the functions or functions he needs for his application and expose them in the web service. And so, then, the data scientist is not owning the full cycle from the training to the serving of its model.

Jean Machado: Let’s go to our demo, then get hands-on with example of setting up a project. So, the goal here is to start from, actually, nothing to set up a new service, make automatic train, and automatic setup production live inference endpoint. So, we start with our Jinja dev tool we mentioned before, it has this command to bootstrap our project, and the tool can bootstrap many different kinds of types of projects. Here we pass our ML service as it is optimized for doing ML. we say the team that is owner of the service and some other details around where to store the Docker images, and we give it a name, so let’s call it Demo Iris.
And let’s see, it is setting up our infrastructure now. It will ask us a set of questions, and we can customize here the module name of our Python application. Let’s not change this. Here it’s asking us if we want a live inference service rather than a batch inference one, that’s the case now. And some other information that you can leave as default, including the location of the MLflow service. So, given these information, it can starts doing the job, and it will create not only the files locally, it will create a GitHub repository for us, and the whole AWS infrastructure necessary to run this service. So, it will take some time until everything is in place, around two to three minutes. We’re going to pause here and resume when it finishes.
So, our pipeline finished. Many steps happen, there is some log ins with details of information, and let’s proceed to check the files. So, let’s enter Demo Iris, our new repo that was created by the command. And here we can see that there is a project setted up for us with many different files. Right now, the files are in Git but not committed. Let’s commit them and send them upstream so we can start the Gitflow process and get the build running while we explore a bit more the project.
I’m sending a commit here with all the files, and we can push them. Done. They are on the newly created GitHub project, I will show you after we take a look on our local code. So, the demo file structure is here. You can see there is at first the module with a project name. There is where application codes go where we develop our models. There’s also a folder called Docs where we can find documentation about the ML service. And the third folder is the SDK of the ML service where it’s wiring up all the pieces of infrastructure together.
We also have folders with samples where we can find ways to use MLflow and the live inference with different models, so we can have different sets of things that are possible. Spinnaker is a folder created automatically for our deployment process. Here we can configure how to Canaries and rollback and many other options. The test folder is where our automatic test go, and that’s basically the folder structure of a new ML service.
Let’s zoom in, then, in the training process that gets automated by default for us. We’re going to find a train file in the project main module, and if we see it, it’s basically calling our samples. Here the sample that we are using is the iris classification sample using scikit-learn with a random 4S classifier. You can find online examples that do exactly this piece of code, is basically splitting a data set, training, and the difference here for what you find online is that we are calling MLflow and integrating our model with MLflow. Logging it basically, so you can see our model’s trained in this line, and it is being persisted and logged in line 35.
So, with this, we get our model saved in MLflow. There are some other details you can see here, an interesting one is the autolog enabled. This means that MLflow will look on the model structure and infer a lot of useful information for us and log it in MLflow that we can check up in a moment. So, now you had a sense of how we do training in our ML service, let’s take a look on the outside of the local machine what got created for us and how does it work.
In the bootstrap commands there were a phase where we setted up the GitHub project, and a batch that I made brought the files here. It’s the same file structure you saw before. After setting up GitHub, we also have our automated Gitflow step using Woodpecker. And it is also setted up automatically to us, and you can see the build our just made commit, and the steps that it’s composed of. So, the interesting parts here are QA checking phase where we do many steps of quality assurance, we have a credential check to make sure we don’t put any sensitive information in the repository.
We found it useful to use pre-commit in the CI as well. Normally, it’s used on the local machine only; we can make some extra checks in the CI by having it here also. Unit testing and integration tests also have their own phase, we do type checking with Mypy and we have a phase of coverage. Once all these checks are done and successful, we can then start training our image and for that we need to build our Databricks image with the source code and trigger our Databricks job, which is the job that is running right now. We’re going to stop this video here until we get this job to conclusion, so we can see how it goes. So, our training step is done, as you can see, and let’s take a look on its results. So, the end result of a training is a registered model; to see that, we have to go to Databricks in the model section of MLflow. Now Theo will walk you through the MLflow page.

Theodore Meynar…: Yeah. So, here, as you can see, we have a new model, Demo Iris, which have been registered and by clicking on it we will see the all the different version. So, in our case we just have the first version, which was registered, but we can click on it and we’ll get more information. So, on top, you can see the source run. You also can add some tags if needed. If you scroll down, here you have the schema where you can have the inputs and outputs for the model.
So, let’s go into more information to get the run by clicking on it, and here we have on top the source which is the raw logs. And we also have a Git commits, so we know exactly the version of the code that have been running, and by scrolling down, we have the different parameters that have been automatically logged with using the autolog and the metrics. Finally, some tags about the model dues, and we have the Artifacts, which is the most important thing which is the model itself.
So, the last one, actually, is the model pkl, so the model as a pkl file. Also, just on top, we have the input examples, so some input taken from the training data. And finally, the conda YAML and the ML model which are two files to make sure our run is reproducible. Now, I will pass it back to Jean who will go through the step of taking this model that is train and saved MLflow and put it back into a production subs.

Jean Machado: Thanks, Theo. So, let’s take a step back and look at the source code on how we go to production, then, having a MLflow model. So, as you saw before, where we log the model into MLflow is in this line, where we pass the model using the MLflow API. Then it stores it online in the cloud and then we are able to use it in the next step with BentoML. So, let’s take a look on how we pick MLflow model and transform it to a BentoML service.
So, this code is the same you saw summarized in the slides before, it basically gets the latest MLflow run downloaded and puts it into BentoML. So, we start mental model with this predictor function and pack the model giving it a name. Once you save it, the model is inside BentoML and when we can create a Docker container out of it and we have our service ready to go. And if you recall the slides before, here is how we do the inference part of BentoML.
So, the first interesting aspect is the line eight where we loads the model that we saved before, and here in more details than in the slides before you can see the definition of our API of the service, and then finally the prediction part. So, this class can be extended by the data scientist, and here is where we convert a model into a data product that can be used online to do live inference. Now that we zoom into the code, let’s go back to our build and see how this code is transformed into a deployable and shipped to production.
So, come back to Drone, we had this step of training our model. Once it’s done, what happens is a new step in Drone is triggered that is responsible to take our model to production. So, it’s already done; if we inspect its details, we can understand what it does. It basically builds that BentoML image that we saw the code before, packaging the model and it saves it into a Docker ECR so to be pulled by Kubernetes. The last step is where the deployment to production happens, and it calls Spinnaker handing it over to send our code to production. This step is built and supported by our GetYourGuide infrastructure team as part of all other services, so we can share ownership and work in the same tools.
So, Spinnaker is the central tool for deployment. Here you can see there is a series of steps by default, and here we can do our rollback. Also, we can configure Canaries and have complex flows to go to production. But our flow already succeeded, as you can see, so which means we already have a site running. Spinnaker deployed it to Kubernetes and we can see now our service is functional and rendering something. So, this page is given to us by BentoML, it’s a open API website. And we can see that BentoML provides us with some standard endpoints, for instance, to do health checks on the service, or query metadata or also doing monitoring with Prometheus.
But the very interesting part for us is the predicted point which is the one that we created and customized. So, if we load it, we’re going to see that there is an example here on how to use it, and we can try it out. So, let’s do it. And if we executed, it will perform live inference for us immediately and we can see the results here. This is a prediction class on Iris, if we change the parameters, we can see the prediction change, for instance, which means our service is running and is performing live inference correctly.
So, this brings us through the whole process of from starting a template to doing automatic training to doing online inference. You can see that with a very simple bootstrap command, the data scientist can have something running. And from this moment on, he can customize the model and add his own model, and having a setup that is all working, and it’s only to be iterated further. So, at this point, we conclude our demo, and I will hand it back Theo to walk us through our final remarks.

Theodore Meynar…: So, this close the demo, and now we’ll go through our final remarks. So, first, the integration with our existing infrastructure and the reuse of existing tools help us to get other teams support and collaboration on the project. However, we also needed some machine-learning tailored tools for the platform, so we compared and review a lot of them from AWS SageMaker to Seldon and many others. And so, for that, we needed to test them and making that run on a small project, which was quite time consuming, but also vital to appreciate the ease-of-use and how they integrate with the rest of the infrastructure. Finally, complete and design document explaining all the moving parts and the constraints specific to machine learning project was great to align everyone in the tech organization for this project. With that, we can go to our conclusion.
So, as we grew our data science team and the impact, we needed to make our process more sophisticated. So, this led us to a machine-learning platform where we actually incorporate good software engineering practices, and add some twists for ML. So, this platform help our data scientists to build model faster by focusing on the model and the data, not on the infrastructure and setting things up. It’s allow data scientist to deploy safer with automated tests and CI/CD; it’s called continuous integration and continuous deployment. And finally, it’s helped the data scientists to document automatically their experiments, so make sure that it is tracked and understand how they arrive where they are. So, if you’re interested in improving the platform or also excited in using it, by the way we’re hiring, so please do not hesitate to reach out. And that’s conclude our presentation. Thank you very much for your attention, and please do not hesitate to send us a feedback. Thank you.

Jean Machado

Jean Carlo Machado is a Senior Software Engineer at GetYourGuide. He is working with Ranking, Recommendations and the MLPlatform. Jean is excited to see the MLOps field develop and to apply its good p...
Read more

Theodore Meynard

Theodore Meynard is a senior data scientist at GetYourGuide. He works on our recommender system to help customers to find the best activities to book and locations to explore. Before GetYourGuide, he ...
Read more