Driving gamer personalization for millions
Reduction in data processing time¹
Better AI content recommendations²
Cost reduction compared to previously utilized standard VM
With over 300 million copies sold, Minecraft is the best-selling video game of all time and continues to grow after nearly 15 years since the game first launched. Mojang Studios, the studio that develops the game, continually collects game and player telemetry data so that it can make Minecraft even better. That’s a massive challenge because Minecraft uses an open-sandbox format, in which players can do whatever they want. As users engage with friends, build houses, fight mobs and explore the terrain, Mojang Studios works to understand their behavior by collecting player data on its Bedrock game. Mojang Studios then uses data and AI to personalize the player journey by making content recommendations in Minecraft Marketplace. All the while, the studio harnesses technology to leverage social media and gauge player sentiment, looking for signals that would indicate potential issues or opportunities to improve the game.
Data processing bottlenecks leave critical business questions unanswered
For years, Mojang Studios processed and analyzed this data using virtual machines, Azure HDinsight and the Apache Hive data warehouse. As the size of the Minecraft player base grew, this combination of technologies struggled to scale. Before they could analyze data and create models, the company’s data scientists needed to download individual data sets from relational databases and coalesce them in virtual machines. Collaboration was limited because they coded in different languages and shared code on an ad hoc basis. Processing data in Hive for data science purposes was slow, and data scientists were forced to narrow their analysis to a single day of behavioral data for many analyses.
“The reality of video games is that we need to analyze data on a time series basis to really understand trends and seasonality,” explained Francisco Rius, Head of Data Science — Minecraft, Microsoft. “Looking at one day simply can’t show you how your business is evolving — but due to the difficulty of processing data, that’s what our data scientists were resorting to.”
When data scientists needed to create data stores or export data, they relied on a data engineering team that was too small to keep up with demand. And the only way for data scientists to feed data back into the game and make improvements was to create a feedback loop from a virtual machine that was designed for analytics. Because of this technical fragmentation, many of Mojang Studios’ most important business questions were going unanswered.
“Were our players having fun in Minecraft?” Rius asked. “Did they prefer combat or creating and exploring? Which segments of players were more engaged in-game? And how could we better optimize our recommendations in Minecraft Marketplace? These are the kinds of unanswered questions that made us look for a unified data environment.”
AI and machine learning power a better gaming experience
A key priority for Mojang Studios was to help its data scientists collaborate, share code and learn from each other as they worked. The company considered building its own data platform based on virtual machines that would provide data scientists with notebooks. But that approach would have forced Mojang Studios to convert its entire data ecosystem to SQL technology and create relational databases for the massive amounts of data it was collecting. In the end, Rius and his team decided they needed an Apache Spark™ engine at the core of their new solution. Based on recommendations from former employees, they launched a proof of concept on Databricks.
“Our first impressions of Databricks were great: our proof of concept was supposed to take a couple of months, but we had it up and running in four or five days,” reported Rius. “That’s because Databricks plays nicely in the Azure ecosystem blob that we were using, and our Hive meta store was easy to mount in the Databricks environment. Our team started adopting Databricks without us even pushing them. And when we did a cost analysis after the fact, we found that Databricks wasn’t costing us any more than our previous environment.”
Since then, the lakehouse architecture has opened a world of opportunity for Mojang Studios. The company now orchestrates all machine learning and AI activities through Databricks. Data scientists have taken on basic data transformation and automation tasks that the data engineering team used to handle. The data science team can work with data in their notebooks, process and analyze it, and then push it back into Minecraft services when they’ve finished.
Mojang Studios’ main machine learning and AI use cases revolve around optimizing recommendations in Minecraft Marketplace. The company uses algorithms that borrow best practices from several industries to deliver personalized recommendations for millions of players each day. A similarity search engine, powered by several Microsoft algorithms, lets players click on similar content that will enhance their gaming experience. “These algorithms were basically plug-and-play in the Databricks environment,” reported Rius. “By building our own recommender system in Databricks, we’ve been able to make major strides in optimizing Minecraft Marketplace. This kind of work was cumbersome before, but now it’s totally integrated with the game.”
Mojang Studios has also integrated its Databricks environment with Azure Cognitive Services, which lets the data team use natural language processing to gauge what 130 million users are saying on social media. “The Azure Cognitive Services API worked seamlessly with Databricks,” said Rius. “We can now run a quick algorithm to understand player sentiment, identify the main topics of social conversation and find out whether there are any negative sentiments to be concerned about.”
With Databricks, Mojang Studios is also performing deep data processing to understand the relationship between the quality of the game and player outcomes. “We can now determine whether reaching a specific milestone or hitting a certain bug will make a player more or less likely to continue with the game,” Rius explained. “Machine learning and AI are helping us process data at scale and understand the big signals. With Databricks and Azure, we’re using machine learning as a service rather than having to build these capabilities from scratch over time.”
Processing data 66% faster leaves more time for innovation
Mojang Studios measures the value of its Databricks implementation largely in terms of time saved. Before Databricks, Mojang Studios’ analysts would often spend most of the day processing data, and their output required significant iteration. Now that the company has dramatically reduced its data processing time, analysts are focusing on answering high-ROI questions. “We’re making better decisions about marketing, social media and the overall direction of our business,” said Rius. “These are areas where we previously lacked the time to offer strategic input because of all the data work we were doing.”
Rius ran a long, intensive query in both Hive and Databricks and discovered that Databricks delivered 66% better performance. “That opens up a ton of time for our relatively small team to focus on insights and analysis, rather than wrangling and processing data,” said Rius. “I’ve found that we’re able to spend significantly more time gathering impactful insights rather than manipulating the data.”
But Mojang Studios’ transition to Databricks has bottom-line benefits, too. Today, Mojang Studios is spending up to 20% less on Databricks than it did on the previous setup on a like-for-like basis. And data scientists finally have the time series view they require to understand player needs and preferences to deliver a gaming experience that keeps players coming back for more.
“Databricks gives us the perspective we need to know what our players want,’ Rius concluded. It helps us release gameplay updates to maximize impact, and we’re delivering recommendations tailored to users’ behavior. Without Databricks, these capabilities would have taken us many years and untold budget to build. We would definitely make the same choice again.”
1. Based on benchmarking of a recurring analytics process using virtual machine vs. Azure Databricks
2. Based on the close rate of AI curated recommendations in Marketplace vs. heuristic or manually curated content recommendations