VisitBritain is the official website for tourism to the United Kingdom, designed to help visitors plan their trips and get recommendations on top destinations, both historic and modern. The VisitBritain team faced new challenges after the COVID-19 pandemic changed how and why people chose to visit the UK. Other macro trends like climate change (hotter summer temperatures) and demographics (increased life expectancy) were also impacting travel forecasting. VisitBritain knew they needed to stay up to date and adapt their approaches to meet the changing needs of travelers. Working with Redshift (an Accenture company) the answer became clear: implementing data and AI tools would let them pivot quickly - and effectively.
Primary research from traveler surveys expands understanding of traveler sentiment beyond mobility data (footfalls), spending data (credit card companies), and hotel and flight information that requires an inferential leap to understand the reasons behind why people travel. Traditional surveys from third-party agencies often overlook valuable insights by focusing on pre-coded, multiple-choice responses instead of open-ended answers. However, open-ended free text data presents a new analysis challenge.
At VisitBritain, we wanted to increase the number of tourists using our services. We rely on advertising campaigns to engage and inspire visitors. To evaluate campaign impact, we conduct market research that generates vast volumes of free-text responses from tourists. Historically, extracting insights from these responses has been an incredibly manual and lengthy process; often, the insights arrive too late to have any impact on current campaigns. It is also not a consistent, impartial process. Responses in multiple languages add an extra layer of complexity due to the translation process. The end result is a continual struggle to gain nuanced perspectives and sentiments from respondents to our surveys.
We needed a solution that could streamline this analysis process and improve our understanding of tourist sentiment so we could bolster campaign-related decision-making while weeding out non-informative responses.
“We wanted to leverage GenAI to restructure our sentiment data to make it easy to access to query but also to find things that we otherwise wouldn't know. We created an instant data thermometer for our primary research. Rather than committing days or even weeks to analyze data quality, we can get a data quality score within seconds.”— Satpal Chana, Deputy Director of Data and Analytics and Insight, VisitBritain
To address the challenge on hand, we utilized the power of “Viewpoint,” our bespoke enterprise data intelligence platform, with Databricks Mosaic AI which used several large language models (LLMs) such as OpenAI GPT-4 instead of natural language processing (NLP) tools. We did this for three main reasons:
Next, we prepped the data by translating it (as necessary) and filtering out low-quality responses. In a typical survey of 1900 visitors, we asked 7 free-text questions, received 27K free-text answers, filtered out any responses labeled “poor” or “useless” and kept responses labeled “excellent” or “vague”. For example, a response received in German that said “Mir fallt nichs ein” was first translated to “I can’t think of anything” and then graded as useless.
For the 48% of responses we kept, we used the LLM to then examine sentiment, emotion, and topics mentioned. The model graded sentiment as positive or negative, classified the emotional content of the response, and then classified the topic into one of three pre-defined categories. Finally, the LLM graded the topics by prevalence within the responses. We then fed the scores into gold-level tables within Databricks Medallion architecture. We found that some of the most useful data came from critical responses. For example, a response that mentioned the high cost of an activity indicated that we should include more messaging around value in future advertising. We used few-shot prompting to derive relevance scoring and sentiment polarity, using the different LLMs we assigned to these tasks. Finally, we asked the LLMs to create topic-level and campaign-level summaries of the responses.
To evaluate the results of our AI agent system, we had three primary options:
Other than relevancy scoring and summarization, we primarily relied on LLM as a judge for our evaluation metrics. We had a training dataset that we used as a source of ground truth as we were developing and testing different functionalities. Once we were happy with the initial results, we would then compare them to a registered model on the test dataset so we weren't overfitting to our ground truth data. At one point, we hit a plateau in terms of the quality of responses. We then went back and reviewed our ground truth dataset, which had relied on human-in-the-loop review, and found some inconsistencies, so we went back and made some corrections on how we were reviewing responses based on insights from our LLMs.
We began our data transformation journey about two years ago; we had a strong vision of where we wanted our data to be and how we wanted to use it. We evaluated several data architectures to see what would best support our needs. Ultimately, we selected Databricks due to the strength of their future roadmap. We had confidence that any relevant features we might need would be available in Databricks in the future. This confidence was well-placed, as we were able to quickly deploy our GenAI-based data thermometer. We also appreciated the modular, open source approach of Databricks which made our development and evaluation process much easier.
Digging into our current architecture, we store data and rely on Unity Catalog to enable permission-based access so users can query production data from development environments. MLflow integrated into Databricks lets us easily compare LLM results side by side and use LLM as a judge as a low-code way to evaluate data at scale.
“The Databricks Data Intelligence Platform allowed us to easily compare different models and the sorts of outputs we were getting from them.”— Satpal Chana
“The best part of this project has been getting insight from sources that we never would've found otherwise. Even colleagues who have extensive knowledge of these data assets are finding things they didn’t expect to find, after just one pass.”— Satpal Chana
We have seen some unexpected value from this project; for example, other teams are able to leverage this proof of concept to evaluate responses to other surveys. Another benefit has been the ability to improve our survey process. Now, when people submit responses outside of a drop-down list, we are able to gain information from their free-text responses that help us shape more pertinent questions going forward. Looking ahead, the fact that Databricks is at the forefront of innovation is key. For example, we can easily switch between model endpoints. This allows us to iterate on the latest and greatest GenAI technology, helping us to support the needs of the tourism industry in the UK—now and in the future.