Whether you are working on a live title, pre/post production, ongoing maintenance, future releases, another version of a game, or a brand new title for the market, you're always looking for feedback from the community. There's no shortage of it out there, but it can be overwhelming and hard to sift through. For games shipped on PC and sold through Valve's Steam Store, a great source of player feedback for your title can be found in Steam's game reviews. We have built a new solution accelerator for Player Review Analysis (available here on GitHub) that combines natural languages and machine learning techniques to help game developers understand their players better and respond through their game design, backend operations, LiveOperations, Marketing and, truly, through all lines of business.
With Steam's game reviews, you have the opportunity to see:
- Raw feedback: The player's unprompted words. What they felt most passionate about: positive or negative
- Feedback over time: For players as a whole, or even individual players
- Feedback as a relation to time played: When are people most positive? If they only played for 4 hours? If they hit 100%?
- Feedback for other titles: What are the things that people harp on the most when talking about an aRPG, or an RTS? Does it differ depending on A, AA, AAA+?
Say you've gathered this feedback, you've got it in your data platform, what's next? How does one make sense of it all? Reading through hundreds, or thousands of plain text reviews (unstructured data) to reliably find patterns and or insights can be daunting.
This is where the power of natural language processing comes in. With this machine learning (ML) solution you are able to extract the key terms and their associated positive, neutral, or negative sentiment. Using ML, you can mitigate biases and see what the data is really trying to tell you. This insight can happen at an aggregated or player specific level. When analyzing your own title, you will have access to your Player ID and be able to align that with Steam's Game ID. With this, you can augment your player360 datasets with the sentiment expressed on Steam enabling you to proactively take action to improve engagement, retention and revenue metrics.
Imagine a high value player has just dropped an incredibly negative review. The sooner that you realize that connection, the faster you can take action to mitigate what's going on, engage with the player (and broader community) directly and improve your chances to retain them. This type of analysis is especially critical for live service titles, shipped in cycles of constant iteration.
The insight derived is useful across the board:
- Backend Operations: What parts of the backend are driving frustration? Is it lag, server stability, matchmaking time? Advanced: How could we allocate our backend resources to improve the performance for high value players experiencing these issues?
- Community and Support: Identify sources of friction for players. What's driving them mad? Level up: Build out responses to the top issues players are experiencing so that Community Managers and Support can respond in a meaningful way and, ideally, allay concerns based on planned improvements.
- Game Design: What do people feel is weak, or overpowered (OP)? What game modes are they enjoying the most? Which modes would they like more of, but maybe aren't hitting the mark today? Why aren't they hitting the mark? Advanced: Cross reference suggested improvements against internal player segmentation, play time and other cross-org data points.
- Marketing: Why are people loving your game? What's getting them excited, when you look at the positive reviews, what are the trends, why are they engaging? Take this insight and align your ad creatives, ad/email campaigns and re-engagement methods to what's most exciting for your players. Advanced: Integrate player segmentation across revenue, play style, and other views to create segment+excitement focused outreach that feels personalized to the player.
- LiveOperations: How are your LiveOperations events being received? Which ones are people most excited about, or disappointed in? You'll see the net effect of this through revenue transactions during an event, but you won't get the feels there. Here you'll understand the reason for those revenue results. Advanced: Explicitly join event focused feedback with revenue results for your events and operational challenges. Your event might have been great but you had major server issues impacting one geographical segment who became the vocal minority in your reviews. Only by joining these disparate insights would you see that the reviews are steering you in the wrong direction (from an event perspective) and it's really a backend operations challenge to address.
Now that we understand the why, the how and the impact, let's get to fun stuff!
In the below sections we will walk through how to take various reviews from Steam and process then curate unstructured text into actionable data.
Note: Though we only cover Steam the same pattern can be applied to many other sources of data.
1. Data Ingestion and Social Media APIs
In the data ingestion phase of the sentiment analysis solution, we utilize the Steam API to gather gaming reviews. This raw data is cleaned to remove any irrelevant or corrupt data, and filtered to include only those reviews written in English. This cleaned and filtered data is stored in the bronze layer of our data pipeline, serving as the foundational dataset for subsequent analysis stages.
2. Sentiment Analysis Pipeline
In this section, we create a data processing pipeline using Spark NLP. It begins by structuring and cleaning the text, then identifies sentences and breaks them into individual words, ensuring uniformity in representation. After standardizing the words and removing common but non-informative terms, it enriches the text by embedding words into a numerical vector space, facilitating deeper linguistic analysis. Additionally, it leverages a pre-trained model from John Snow Labs to automatically detect positive, negative and neutral aspects about the game from user reviews. Instead of labeling the entire review as negative or positive, this model helps identify the sentiment of exact phrases related in the review.
3. Author Based Clustering
Moving to the next section of our sentiment analysis solution, we employ k-means clustering to segment the authors of the gaming reviews based on their metadata. This clustering is executed using PySpark's MLlib, which efficiently handles large datasets by distributing the computation across multiple nodes. This segmentation adds a layer of granularity to our dashboard, enabling deeper insights into different user demographics and behaviors.
4. Analyzing the Results
Now that you have your labeled data you can make use of it all. A product manager might look at this dataset and see high negativity related to a specific game feature and adjust their pipeline to address that more quickly. Someone in operations might look at the concentrations of locations for people complaining about server drop outs across different geographies to identify potential multiplayer server orchestration issues across markets. A LiveOps content creator might find more positivity on BFGs and invest more time building skins for those products.
5. Take Advantage of Your New Dataset
You now have a dataset that gives you insight into what your players are saying at scale. This could be used to help personalize the experience of your players and increase retention. By taking this as an input, connecting it to your internal datasets on engagement and revenue you can inform action by community managers, customer support, marketing and offer recommendations. Acquiring players is expensive, finding the players you want to keep is challenging, this insight provides an opportunity to engage with your community and build a deeper relationship with them and, by doing so, improve your player retention.
Conclusion
This solution accelerator for Player Review Analysis combines natural languages and machine learning techniques to help game developers understand their players better and respond through their game design, backend operations, LiveOperations, Marketing and, truly, through all lines of business. A game company, in pre-production, looking to build something new might analyze similar games to find hot buttons (positive and negative) for their target players. A studio during beta may use it to quickly respond to feedback across all players, or post launch to continuously improve the title over time and maximize engagement.
This solution accelerator (available here on GitHub) is focused on the analysis of Steam reviews, but that’s just one data source. This approach can be used to analyze reviews from other sites, forums, support tickets, surveys, indeed any plain text feedback you have access to. As long as you can collect, and ingest it into this workflow/system, it can be used.
Feedback is a gift. We are excited to help those voices be heard, grow player engagement and assist as you further the fun.
Ready for more game data + AI use cases?
Download our Ultimate Guide to Game Data and AI. This comprehensive eBook provides an in-depth exploration of the key topics surrounding game data and AI, from the business value it provides to the core use cases for implementation. Whether you're a seasoned data veteran or just starting out, our guide will equip you with the knowledge you need to take your game development to the next level.