In the online gaming industry we receive a vast amount of transactions that need to be handled in real time. Our customers get to choose from hundreds or even thousand options, and providing a seamless experience is crucial in our industry. Recommendation systems can be the answer in such cases but require handling loads of data and need to utilize large amounts of processing power. Towards this goal, in the last two years we have taken down the road of machine learning and AI in order to transform our customer’s daily experience and upgrade our internal services.
In this long journey we have used the Databricks on Azure Cloud to distribute our workloads and get the processing power flexibility that is needed along with the stack that empowered us to move forward. By using MLflow we are able to track experiments and model deployment, by using Spark Streaming and Kafka we moved from batch processing to Streaming and finally by using Delta Lake we were able to bring reliability in our Data Lake and assure data quality. In our talk we will share our transformation steps, the significant challenges we faced and insights gained from this process.
Speakers: Stefanos Doltsinis and Kostas Andrikopoulos
– Hello, guys. First of all, I would like to say that along with my colleague Kostas, we’re very happy to be here today. The European Summit and Data and AI. And present to you, our personalization journey from single node to cloud streaming. So, our aim is to present to you all the bottlenecks and the challenges we faced along this way. And also share with you the way we went forward. And what we did to go through this process. So, before I get in the technical part, let me first introduce myself. My name is Stefan Doltsinis and I’m a Machine Learning Architect at Kaizen Gaming. My co-author, my co-speaker and colleague is Kostas Andrikopoulos, and he’s a Big Data Architect at Kaizen Gaming as well. And before I get in the technical part. Let me first talk to you a little bit about Kaizen, our company. Kaizen is a top GameTech company in Greece and one of the fastest growing in Europe. And we currently operate under two brands. We have Stoiximan for Greece and Cyprus and Betano, which operates in Romania, Germany, Portugal, and recently in Brazil. And it didn’t take us too long to realize that, if we want to offer a good product to our customers, we should invest in technology. And our aim is to provide more personalized content and tech also personalized offers. So, what we thought is, that we should take down the older machine learning. And with this in mind, develop some learning applications and some machine learning models, in order to automate our decision process and provide more personalized content to our customers. But before I get in what we did today, let me share with you a little bit of our history and our initial workflow. What you see on the right hand side is a number of different components we used to use. We used to have several data sources. We used to have a data warehouse, a lot of databases, files and so on and so on. And this is a very big challenge. If you don’t serve a single point of truth, a single point of reference. We also used to use our local workstations for training. And this, as you can imagine, if you have a lot of data, also becomes a very, very big challenge to go through. And finally, in terms of the application, we had the model, and we had the application. That in order to push that in the production, we had to build an immense deployment, that had to go through a process of that is complex on its own. So, as you can imagine with this initial workflow, we had a number of issues. We had a number of challenges. And what I’ve tried to done in this slide, is to break down those two into machine learning and application. And in terms of machine learning, we… What I wanna mention here is three different aspects. I want to focus on data, on our features, but also in our model. And from a data perspective, the first thing we realized, that it was going to become an issue, was data availability. And what may I mean by that is that, everything’s straightforward. If you want to use one week or two weeks of data, but everything becomes a very, very big challenge. If you want to train with one or two years, then this is where your local workstation shows its limitations. Another fatal thing that I want to mention is what we call time traveling. And by the time traveling we mean, if you want to get the state of your data, at some specific point in time, and you haven’t catered for that while you’re storing the data, then it’s very, very difficult to fetch. On the feature side. We face a lot of issues with the recalculation. And what I mean by that is that, because you put a lot of effort on calculating your features, if you don’t make sure that you calculate them properly and you store them properly, then you will end up in a situation that you will calculate them again and again. And this is essentially all of the processing power you’re putting in that, which makes sense to not repeat as it will cause you issues. Finally on the modeling, versioning was a very big challenge. It wasn’t always that easy to make sure that, what is running in production is also… Was also what you trained and what you wanted to push to production. So, that was about machine learning. I will also go on the application side. I will not go through all of the bullet points. I will mention scalability, which we thought it was very important. And you always need to have in mind that, everything comes from your front application. And your application… Your machine learning application should be ready in order to consume the data, scale for this data and be ready to process any upcoming spikes. So, this is where we used to be. And this is… These are some of the challenges we faced, and I’m sure all of you have faced similar challenges. And this is the point where we decided that we should go forward. We should change our approach and should go to big data infrastructure. And towards this direction and this process, my colleague Kostas, will talk with you.
– So, if we take a closer look to the challenges that we’re facing back then, you see that there are typical issues related to the life cycle of machine learning products, where ML flow can be really helpful. Also we’re facing challenges with the velocity, the volume of the data that we needed to process. And by using Databricks and Nozel, we not only go necessarily scalability, but also the elasticity that we need. So, even fast-forward to where we stand right now, about six months later, you’ll see that we have hundreds of data pipelines, starting from traditional ETLs. We’ll extract our data from various data sources and we store them to our delta lake. And we are not able to reason about the correctness of our data, without some guarantees that we have. And also the scheme enforced. And as Stefan has mentioned earlier, this is really important for ML applications, to have a single source of truth. So you see, it’s easier to start developing your pipeline. So we start from our feature generation up to model training and performing predictions. And now, with the use of ML flow, is able to collaborate more efficiently and to serve the fine arts. Another thing that I would like to highlight at this point is that all this progress wouldn’t be possible, if we weren’t able to re-use part of our codings. And I’m not only referring to the Python code, that was part of our libraries. That was brought it into data base notebooks. But also our sequel transformations, that were part of our daily warehouse deals. That was folded into Spark sequence. So, having a proper stock in place is really important. But, what’s also important, is to be able to design the pipelines in a proper way. So, having the proper start is one part of the solution. What’s equally important is to be able to have appropriate design in your pipelines. So, in order to do so, you need to think of what is your input data? And What is the optimal output of all your data? And you should be able to reason about your choices by thinking of, what’s the necessary related study you’re trying to achieve? And who’s going to perform to perform the actions? When you answer to those questions they have, they become clear. So, now we have, we’ll present the two distinct use cases. And we’ll see how the difference in their requirements will affect our own design. So, we start off with a use case, where we use our latest data, in order to feed our model and perform predictions. And actually, what we’re trying to achieve is, to feed those predictions directly to the microservices, in order to take some responses. It takes so much. So, what is that you have is, that we start with structured data, the custodian Kafka . And we tried to keep the latency in the order of a few seconds. And the reason why you want to do so, is because there’s no human direction in this pipeline. We want our microservice to be able to perform some actions immediately after the predictions are ready. So, the best way to achieve this is to store data, into Avro for Mac, back into Kafka, where our application can directly consume. So, I believe that by now, how we can able to achieve this is pretty clear. We use structured streaming to perform the feature generation. You store those predictions back into Kafka, where you have a second structured streaming that leads those generated features and feeds them into our model and store the predictions back into Kafka. And the reason why we’re using two structured streaming is that, we are able to scale better using . So, whenever you find yourself that you have an event driven application, where you want to take immediate reactions after, when your data is available. It makes sense to store your data into Kafka, because not only you are able to achieve low latency, but you also are able to handle issues with slow consumers. So, let’s see now, a more complicated design. Where you have three mutual differences. The first one is that, now our application is not an event driven. In the sense that it runs sporadically and uses all the available information to perform actions. And now, the major difference is that, now our model is rather complicated. So, it uses way more features, in order to perform the generation of the output. And also, some of the features not only require recent data, but also historical data. So, if we switch back to our what and why questions. We now have not only stretched data of Avro stored in Kafka, as we used before, but we also have delta tables that holds the historical data. And what we’re actually trying to achieve is to, mark the latency of our pipeline with a periodicity of our application. So, since our applications use all the available information, it makes sense to store our data into sequel data. And we use Spork sequel for this one. And also, because we want to keep a history of our predictions. We also are storing the results into a delta table. So, how can it see this and what are all the challenges that we have? First of all, we use structure streaming, to know the recent data and the state, in order to perform the future generation part. And because our features are also taking part in different feature vectors. We want to store them as delta tables and to actually use it as our cussing layer. The second challenge that we have is that, it’s not frugal to join hundreds of features together of streams. Because there are many limitations when you use structured streaming. So, as a second step, we switched the batch processing and we take all those delta tables and join them together to a computer feature vector. And as the last step, we use once again, structured streaming, to get the latest data for our delta table and feed them into our model to perform the predictions, where we store them back into the PostgreSQL, and also in the delta table. And the key takeaways from this design is that, whenever you have features that are computationally heavy, it make sense to store them to delta tables, because… Actually you have a lot of optimizations that you can perform. Also, whenever you find yourself in a need of joining multiple features. It makes sense to switch to batch processing. And last, if you don’t… If your application doesn’t have the limitation of using a SQL database, a Cassandra would be actually a very good fit for this use case. So, I believe that by now, you have a good understanding of our design. So Stefan, will walk you through the details of our journey… Of our personalization. So Stefan.
– So, Kostas talked about a couple of different architectures that we’re using in our day-to-day pipelines and for our streaming. But our aim is to offer personalized offers and personalized products to our customers. So with this in mind, we have developed a couple of applications, that I’m going to talk to you about now. First of all, we have what we call, sportsbook personalization. And before I get in the details of this project, I’ll first say a couple of things, in order to understand, what’s the rationale behind it? And what we’re looking for. In our day-to-day, we have around 3000 unique games that generate a number of different markets. And on those markets is the final products our customers. So, our customers have to walk themselves through all of this information, in order to find what they’re looking for. And in several times, this can be very frustrating. So what we thought was that, if we can offer a personalized content, which is filtered down to what they’re looking for, to what they want to bet on, then we can improve their experience and finally increase their loyalty. And from a technical perspective, we decided to build a recommendation system. And to do that, we went through a collaborative filtering approach. And in every collaborative filtering approach, what you do is, you first build the utility matrix and which has the rate… Which has the preference of every customer. And if you take the matrix, on one axis, you have all of your customers, and on the other axis, you have all of the historical events, which every customer can choose on. And initially, you use their preferences or their historic or historical preferences, to fill in all of those customer event combinations with some kind of ranking. So, as you can imagine, this is a very, very sparse matrix and it’s sparse because, most of the customers have only chose to play in the past. Just a few of the potential upcoming events. And in a collaborative filtering approach, what you do is, you take the sparks matrix, you move it to a latent space, that tries to capture all of the customer game combination dynamics. And once you’ve done that, you bring it back and through those dynamics, you try to fill in all of the gaps in the sparks matrix. And those gaps are finally, the preference of your customers. And based on the ranking that you have destinated, your offer… You choose to offer to your customers the top rated games. And so, in order to do that, we use the Spark machine learning library, and we chose to go with the alternately squares algorithm. So, in a daily basis, we train our algorithm. And what that means is, that we fence around 600 million of transactions, because we use one year of data. And we produce a utility matrix, which is about 400K customers, times 300K unique games. And as an end result, we export 500 million daily recommendations. I have to say that, we do the daily training because we want to capture the latest preferences of every customer. Because every day you have incoming data that you want incorporate in your final model. A very big challenge that we faced in this project, and I want to highlight, is what we call a dynamic content matching. Because the upcoming content. In general, our content is very dynamic. So, the upcoming content, is not necessarily something that we have seen in the past. And we have to find a way in order to match link the upcoming events to our history, in order to detect the most use of our training data, historical data. Finally, at this point, I want to mention the metric we’re using. We’re using to evaluate our project. And this is the mean average precision, on the 100 games. And we’re performing in an average of 70%. So, that one was about this approachable personalization. But as I said, we have a couple of applications. And the next one is what we call, real time computation. And the rationale, the motivation behind this project is very, very clear. If one thinks the fact that reward increases loyalty. And I don’t think this is debatable, if you think that around 40% of our customer communication, or our customer support communication, is about some kind of reward and some kind of bonus requirement, requests from our customers. So, this communication ends up in about 4.5 million reward assessments per year. And I’m not sure if this is a big or larger number to you, but I do know that it has two different aspects. From one side, all of those assessments need to be done manually and periodically. Which means that we have… We need to have people to do a repetitive task in a very, very large scheme. And on the other side, we have our customers that need to wait for this periodic assessment. And most importantly, they will end up with an offer, which is not necessarily personalized. And having this in mind, we thought that we should design a system… We should design a project, that it will be real time, and it will provide real time decisions as a bonus allocation. Which the . So, from a technical side, what we did in this project is… What you see on the right hand side of the slide. Is one of the architectures that Kostas presented earlier. And this architecture fits very, very well to what we’ve done in this project. Where instead of stemming our data, and we’re using those stream data to calculate a number of different features. And essentially in the end, what we want, is to stream our predictions. To do that, we’re using a binary classification model. We’re using, again, Spark machine learning library algorithm, very great in boosting algorithm. And final, we stream the prediction, which ends up to an allocation of a specific bonus for everyone one of our customers. But what I want to highlight in this project, is the use of ML flow. Because we’re using a number of different aspects. We’re using experimental tracking, we’re using model deployment, but we’re also using model registry. And I have to say that, I’m really surprised and fascinated with model deployment, because you can have your set up, you can have your training pipeline, you can change whatever you wanna change. And then, if your model fits well on your… Actually, performs well on your validation set. Then you can instantly just deploy the model, by changing its state to a production model. And this is a very big difference to what we used to have before, where we had to go through a input to the production. So, as the last thing, I want to say that the model registry has also helped us very, very much, because you can always go back and see how your model used to perform. And then, you can compare it to your current performance. And this is something that helps you a lot with the monitoring application. So, this is where we are today. This is what we’ve done on those couple of use cases, but this is not where we want to stop. We have a very big vision and we’re aiming for that. And Kostas can talk you through.
– So our future steps is to move closer to CAP architecture. Where we want to switch from part processing, into real-time processing. Also, we’d like to evaluate the use of Cassandra, for the use case that Stefan has mentioned before, as a replacement of PostgreSQL. But also as a feature store, in order to be able to re-use our features. And since our experience with ML flow is really amazing. Our next stop is to use model serving, where it seems to be a perfect fit for various use cases that we have. Last but not least, we want to also use Redis. In cases, where our applications perform simple key lookup’s. So, please stay tuned for our next journeys. And at this point, we would like to thank you all for attending this session and to remind you that your feedback is really important to us. So, please don’t forget to rate and review the sessions. Thank you.
Stefanos Doltsinis is a machine learning architect at Kaizen Gaming working on the organization’s transformation into the AI era. He has worked in academia and research with focus in applied machine learning in the areas of robotics, production automation, energy consumption optimization. He holds a PhD from the university of Nottingham with a focus in Reinforcement Learning.
Kostas Andrikopoulos is a big data architect at Kaizen Gaming, where he leads the development of the high performance in house real time data platform that processes hundreds of millions events daily. With more than 15 years experience in software development and distributed systems he holds a Master of Digital Networking and Telecommunications from the University Piraeus. Kostas is passionate about Functional Programming, Distributed Systems, Data Lakes, Streaming, Apache Spark and Machine Learning.