Games earn more money than movies and music combined. That means a lot of data is generated as well. One of the development considerations for ML Pipeline is that it must be easy to use, maintain, and integrate. However, it doesn’t necessarily have to be developed from scratch. By using well-known libraries/frameworks and choice of efficient tools whenever possible, we can avoid “reinventing the wheel”, making it flexible and extensible.
Moreover, a fully automated ML pipeline must be reproducible at any point in time for any model which allows for faster development and easy ways to debug/test each step of the model. This session walks through how to develop a fully automated and scalable Machine Learning pipeline by the example from an innovative gaming company whose games are played by millions of people every day, meaning data growth within terabytes that can be used to produce great products and generate insights on improving the product.
Wildlife leverages data to drive product development lifecycle and deploys data science to drive core product decisions and features, which helps the company by keeping ahead of the market. We will also cover one of the use cases which is improving user acquisition through improved LTV models and the use of Apache Spark. Spark’s distributed computing enabled Data Scientists to run more models in parallel and they can innovate faster by onboarding more Machine Learning use cases. For example, using Spark allowed the company to have around 30 models for different kinds of tasks in production.
Speakers: Vini Jaiswal and Arthur Gola
– Hello everyone. I am Arthur. I am a Data Science manager at Wildlife, and here with me is Vini, she’s a customer success engineer at Databricks and today we’re gonna talk about Using Machine Learning at Scale: A Gaming Industry Experience. So there are four topics we would like to cover today. First one is to give a little bit of context about the gaming industry. Then we will talk about how Wildlife fits into that context, how our data platform works. And then we will showcase a use case where we leverage our data platform to generate business value, and finally, we’ll talk briefly about other use cases of Data Science at Wildlife. So first about the gaming industry, the gaming industry is very big, right? It’s a huge industry, it’s nearly $160 billion expected in 2020 in revenues. Of that 77 billion are just in mobile gaming, it’s the largest platform other than PC and console. And also it’s the one that grows the fastest. So mobile games grows two digits a year and PC and console only grow one digit a year. Also grow quite a lot. Another interesting thing is that just mobile gaming is larger than music and movies combined. So it’s a huge market and it grows a lot, and that’s one of the reasons why Wildlife has focused on this industry. Right, and our mission here at Wildlife is to develop games that make billions of people a tiny bit happier, and we do that with a blend of art and creativity together with Data Science and high-end technology. Some of our numbers we’ve released over 60 games today, up to today, we are on the top 10 of the largest mobile gaming companies in the world. We have over 2.5 billion downloads to date, and we received tens of terabytes of new data every day from the users that are in our games. We process over petabytes of data daily in our core pipelines, and to handle all this kind of volume all this data that we process, we need a very robust data pipeline. And to talk a little bit about that, I call to the stand Vini.
– Thanks Arthur, exciting numbers from the gaming trends. Let’s talk about the architecture that runs behind it. So as we learned, Wildlife games produce millions of transaction from the games, played on smartphones. As a gamer, you will download the game and play the game. Both these actions generate a separate stream of data that the company uses for fact-based informed decision-making. It creates new products and offers simpler and awesome experiences for gamers. Now you can imagine processing all of this data at a big scale is quite challenge. And how do you continue to maintain and update it as the new data gets generated every day? Device sends the data in every session in either batch or streaming fashion, through kafka and to provide durability and scalability to their streams. This gets fed into the S3 bucket. At every moment of the day, there is someone in Wildlife who is consuming the data. It could be developing new products, new features optimizing the existing features, exploding gaming insights. So you can imagine processing this data at scale and to maintain the SLA for internal downstream applications. This seems relatively simple, right? All of the player data and gaming events which is of massive scale, say terabytes get processed and stored intelligently in the Data Lake. But as the data engineering gets complex here if you have already experienced it you may know the pains, the key to doing this is how we can achieve ETL processing in a consistent, reliable, and most importantly in a scalable manner. Wildlife standardized most of their production pipelines on data, which allows for consistent enrichment and optimization of their downstream applications. The different tiers of Data Lake architecture, bronze, silver, and gold ensures that there is a table categorization available at every level, based on who is consuming the data, and what are the use cases. All these downstream applications need complex processing. And imagine when you combine streaming, batch, continuous integration, continuous enrichment, this turns into a very complex architecture. But with the power of a unified platform like Databricks, and data as the foundational layer, it allows the machine learning engineers, data scientists, BI analysts, to focus on their models, development of machine learning models, And for engineers who can focus on building the complex orchestration and governance and less on the operations and managing the infrastructure complexity. So as we move into Data Science part, we want to think about how Data Science teams and what their problems are. What are they focused on? You can imagine a data scientist using myriad of libraries and tools to do their models and implementation into the new products layer. So the myriad of libraries and tools available with Databricks like Horovod learner, TensorFlow, and many others allows the teams to develop models at a very faster rate. The Databricks environment allows machine learning engineers and data scientists at Wildlife to do their job in a much more efficient manner. The platform enables them to interact collaboratively in a work environment and also allows them to perform analytics using their favorite languages, most popularly Python and R being their favorite language. And there are a lot of libraries that run against small and enormous scale. Wildlife also leverages MLflow, to productionize their machine learning models, right from the design to implementation. And Arthur will cover more on MLflow use case. One other key I will highlight is ML runtime, which uses the optimized layer of Spark and Databricks engine optimizations combined with the libraries, which allows Wildlife teams to iterate faster. Before Databricks, It used to take one to two models for any given minute, but after using Databricks and especially ML runtime they can iterate at a much faster rate, now they can produce 10 or more models which allows them to innovate faster. Now what happens with this models and data which is getting a process with ETL and Data Science. Now it is ready for other less sophisticated users and their users so they can leverage this reports, data, and other things which are made available by our data scientists and data engineers using Looker as their BI tool. They also have internal BI tool, which can be leveraged by data analysts and product managers. That’s about the architecture, I will hand it over to Arthur to talk about the use case.
– Thank you Vini, and right now I’m gonna tell you about how we can leverage all this Data Science platform to generate business value. I’ll first describe one problem that we have, a very interesting problem, personalized offers. Then I’ll cover the system that we built to solve this problem, and finally, we’ll go over a little bit of what we are planning to do in the future Right, so the offer problem is all about finding the right product to show to the right user at the right time. And is it any relevant? Yes, it’s very relevant. About 60% of the company’s revenues come from in-app purchases, and out of that most is offers. Also is it relevant to do personalization of the offers? And again, yes, our users are very different. So about 95% of our user base never even make a purchase in our game. And from those 5% that do make a purchase, only 5% are responsible for more than half of the revenue. So our revenue is very concentrated and in very few hardcore users. Those hardcore users, they played the game a lot, and they spend a lot on our games, and for them, we have to show offers that are tailored, right? So we have to show offers with probably a very high value and like a hundred dollar offer with a lot of rewards and that’s what they want. But then for a very casual user, one of those 95% of our user base that don’t make a purchase, showing on a hundred dollar offer is probably not gonna make them, make the first purchase. And we should then find a very cheap offer that just gives them enough for them to move to the next level or to buy that extra item that they really wanna achieve. I’m now gonna use the example of the Tennis Clash offer to explain a little bit of the different dimensions that we see in an offer. So the first thing that we started optimizing in an offer was the frequency. We had offers, the games with offers that trend monthly and also weekly, but then we started to test more frequent offers than weekly, like daily, and even less than daily. We identified the best frequency for each game. And we didn’t think it was a good use of our effort to try to personalize the frequency of the offers. So then we moved on to the next use case which was to optimize the price. For the price, then here is very relevant for us to personalize, right? So we need to find the right price for the right user. As I mentioned the hardcore user, probably won an $100 offer because that give him a lot of rewards and a lot of benefits, And, you know, the, the casual user is not looking for a $100 offer. The next step was to optimize for the discount and discount is tricky because basically if you increase the discounts, you probably see an increase in your short-term revenues, right? So have higher discounts, more people make purchases, but they make purchases with more value per dollar, and that could lead to cannibalization in the mid to long-term and a reduction in the lifetime value of the user. So it’s tricky to find the balance of the discount and to do so, we generated a lot of value with our system. And finally we have content, so what we show to the user. And here’s where the dimensionality explodes, right? Because we can, we could give, offer the user currencies, we can offer the user items, upgrades, a lot of different things, and here we have the problem of how can we simplify and look at less dimensions so that we can have a model that actually generates business value. This is the system that we built to do that, right? And the main components here are of course we have the app with, that runs our game here, the example of Tennis Clash game, our app connects to the Backend Servers who really control the game progression, the Backend Servers then communicate to our Offers Gateway which first talk to the Experimentation Platform, where we can define AB tests and other experiments that we run all the time, and then these Offers Gateway also communicates to the Model APIs that actually hold the model, the model that makes the prediction that we want here. Finally, we have a Feature Store that holds data from the users, mostly behavioral data, right? So what the user does inside the app, and we use that to make our predictions. So let’s have a look at how this flow works. First the game makes a request for an offer because the user is close to the offer, the point that he will be shown an offer. The game backend then forwards that request to the Offers Gateway that in its turn communicates to the Experimentation Platform that stores the configuration of the experiment that this user is allocated to. So basically this experiment, we’ll say in which a group of the AB test that this user should be attributed to and what model the Offers Gateway should connect to, to get the offer recommendation. Then the Offers Gateway will communicate to that Model API, the appropriate Model API, defined by the Experimentation Platform and the here the Model API is where actually the magic happens where the recommendation is created. And this recommendation is created based on the features that our store in the Feature Store. So Model API request features from Feature Store, and features come back to the Model API that generates the recommendation, recommendation is sent back to the Offers Gateway and back to the Backend Servers, all the way to be sent, to be shown to the user as a pop-up. To do that, we, from the Data Science perspective, the main work of the data scientist, isn’t the Model API, right? So it’s, it’s there in that Pico file inside the Model API that’s actually generating the recommendation. And before we cover the Data Science work, we also have here the, we store information from each of these steps of the process to monitor the health of the system and also the perform the business performance. Now to help the data scientists create models and train models, we created an in-house library that we called the da Vinci and it’s based on a few topics. The first one is it should be application agnostic, right? So basically we could use this library to train any kind of model. Second, it’s totally MLflow integrated, so you don’t have to worry about versioning your datasets or versioning your models. It does all the versioning for you, and you can use that for your benefit. Also you can run this library locally or on database clusters. So if you wanna do an experiment a quick experiment locally, then you can do that. But then when you figure out that you need more computational power, you move on to a Databricks cluster and make them do the heavy lifting for us. Also, we standardize the training process in the deployment process to make it faster for us to iterate, automated reporting, showing business metrics, model errors, and feature importance. And finally we wrapped around Hyperparameter optimization so that we also make that work easier for the data scientist. I’ll show you now an example of a notebook that does the training process for a specific model that we have. The first step of the process of creating a model for a, for a offer recommendation is to run a data collection process that creates the data set. I’ll not cover this today, perhaps we can have a separate presentation to talk about the data collection process, but first here I’LL focus on the training process. And I’ll go through installation of the library, then definition of game specific parameters, training the model, testing against baselines, and how we do the deployment. I am now gonna show you the da Vinci library, our in-house library to train models and deploy them in production. The first step to use the da Vinci library is to install it, right? As we have the da Vinci in Wildlife’s Artifactory repository, that’s very easy. We just have to run a pip install command and we have it installed. The second step is to define game specific settings. And the first one and most important one is the training set, right? And the training set was generated in a previous process, the data collection process and we can select a game, and the ID of that dataset. This ID defines what is the file that should be read, and that contains that dataset. And then we can run a function that just gathers that dataset. We can have a look at the dataset, for this example I’m using a subset of our data that has a 1,000,400 lines, and here is how it looks like. So basically we have a user ID, an internal user identifier, we have a lot of features. So for example, average ticket, spend velocity, days since installed, number of matches and many others. And in India, we have a target variable that in this case, just says if the user purchased the offer or not given the opportunity of making a purchase. And now we define here a mapping from the output of the model, to the actual input that the gain backend will receive, right? So the payload that the game backend will receive and transform into an offer. ‘Cause this is what this piece of code is for. The third step is to create, to actually train the model. We will train a very simple model here as what we call the CompoundClassifier. The idea of the CompoundClassifier is we have several different classifiers, each classifier gives us the probability of a user purchasing one kind of offer. So say that one kind of offer is a $5 offer that gives a 100 gems and one item, and another kind of offer would be to have $10 and with 200 gems and two items, for example. So we have different kinds of offers with different prices and different contents. And the idea is that this classifier will give us the probability of purchasing. And if we have an estimate of how much this purchase means in terms of long-term revenue for the user, we can just create a measure of the expected value of showing that offer to the user, and pick the offer that has the highest expected value. So define the CompoundClassifier, we create a, so the metadata for the model, so the project name, product name version. Then we define the structure of the dataset. So these parameters define the structure of the data set and that’s information for the library to know what is the target variable? What is the ID? And also other important information from the dataset that the library will use. Then we define the training parameters. These training parameters are things like if we are training multiprocessing, how we should input missing values, how we can end code categorical variables and also parameters that define the hyper parameter optimization. Finally, we have to define the model or the models that should be chosen from, and this case just for simplicity, we’re just using a random forest classifier from scikit-learn library, right? And here we are defining the search space, and we can also define some callbacks which are different ways to handle the versioning of our checkpoints in MLflow. We have a standard way of creating these callbacks, so we usually don’t have to worry about this, but if we want some specific different kinds of callbacks then we can implement here. And now we instantiate, the CompoundClassifier class with all the parameters that we defined in the previous sections. Now we were ready to train our model. At this point, I ran this model training. It took six minutes to run. The reason why it took six minutes to run is that actually the model did not perform the whole of the Hyperparameter optimization, because as I had run this with this exact same parameters before, the system identified this Hyperparameter tuning had already ran and stored the best Hyperparameters that they found, that the system found and just apply these parameters and ran the training for these parameters below. That’s why it took this amount of time, otherwise it would have taken way more time and this is very handy because it helps us not to train over and over again with the same parameters. The output of the model is basically a set of different binary classifiers, as I explained before, each of them gives us the probability of a user making a purchase on a specific kind of offer. And after we have a run, the running process creates a run ID. So this run ID is very important for future use for the deployment process, because this is how we identify the model that we will put into production. But before putting the model into production, we wanna make sure that the model is any good, right? Not only in error metrics, but also in what we expect would be prediction metrics. Here we have, I’m gonna compare our model to the simplest model possible. So just recommending random offers to users. And I just have to the simple piece of code here that generates a report. And this reports gives me an estimate of what would be the performance of our train model against just recommending random offers. So this is what this, this report represents, and this will feed into a visualization that we have that helps us define if we wanna put that model into production or not. Finally, we have the deployment process. The deployment process is basically running one command using that run ID that we defined before, right? So the run ID defines which is the model that we wanna put into production, and with one command, sending the run ID as parameter, we start an automation that instantiates a bot that already has the right image with our model on it, and we can perform tests on this model in staging, right? And then finally, when we’re happy with it, we can press another button and have a deployment to production. The same run ID that we found from the training is the one that we use to define the deployment to the staging environment then to production environment. And finally it is the same ID that we set we use when we set up an experiment. When we set up an experiment, we say please reach to this model here, and this is the configuration that goes to the Offers Gateway and targets the user to see this the recommendations from this model, as opposed to other models as we saw in the previous sections. And all of this that I presented is not just cool. It’s also very profitable. So we’ve been seeing great results from all of this, from four to 14% increase in revenues in our major games this is all with the developments off this system. So this is compared to baselines that were handmade offers by game designers and PMs that created segmentation’s and we are now saving time from them for creating those segmentation’s and those offers, and we are actually even improving in the performance. So that’s great. And what’s next? So Maybe Reinforcement Learning. Up to today with being facing the problem as how we can find the best offer, the best next offer to show to the user to maximize our return in a short term window. And, but also, actually what we wanted to optimize is the sequence of offers that maximize long-term reward. Right? Second, we wanna framework for retraining our model that takes into account explore-exploit tradeoff. What I mean by that is in the beginning, we want to explore a lot of different possibilities of offers. And then as we figure out what are the best ones, we wanna start to exploit those ones, but then as the game changes and the market changes, we probably will find that the best offers are not the best offers anymore. So we wanna be able to explore efficiently and then find the next best offer to show to the user. And finally, we have a lot of dimensions, right? So we’d have a lot of different possibilities of offers that we can show to the users, and especially when we are optimizing content, this becomes a big problem. And we believe that Reinforcement Learning could be a good solution for all these problems, but we don’t still have a Reinforcement Learning model online, when we do, maybe we can have another conversation and I will keep you posted on our advancement. Right, you could be thinking is Data Science at Wildlife just offer personalization? And the answer is by all means no, we have plenty of other use cases here. We use rating and matchmaking systems to predict the performance of a user in a match, and then create fair match-ups from users. So they, our games are fun and engaging and people feel that they are playing against in fair matches against people who play somewhat like them. We use advanced experimentation techniques such as bayesian optimization, multi-armed and contextual bandits to test different hypothesis, and learn more about our product and make our product better as much as we can. We also have LTV models that are optimized to make our marketing operation more profitable and better. And finally we also have ads monetization system that continuously try maximize the ads revenue that we obtained by showing ads inside our apps. This is what’s a small glimpse at Data Science at Wildlife. Thank you so much for your time, and now we are open to questions.
Vini Jaiswal is a Senior Developer Advocate at Databricks, where she helps data practitioners to be successful in building on Databricks and open source technologies like Apache Spark, Delta, and MLflow. She has extensive experience working with Unicorns, Digital Natives and some of the Fortune 500 companies helping with the successful implementation of Data and AI use cases in production at scale for on-premise and cloud deployments. Vini also worked as the Data Science Engineering Lead under Citi's Enterprise Operations & Technology group and interned as a Data Analyst at Southwest Airlines. She holds an MS in Information Technology and Management from the University of Texas at Dallas.
Arthur Gola is the head of product data science at Wildlife Studios, where he leads data scientists to optimize the company's mobile games through data-driven decision making and user experience personalization. Previously, he was a data science consultant for big corporations, having developed projects such as recommending the assortment of products for the physical stores of a large retailer. Arthur studied mechatronics engineering at the University of São Paulo, Brazil, and was once an athlete, achieving the title of national rowing champion five times.