VisitBritain: Extracting Timely Insights on Traveler Sentiment

Using LLMs to Make Tourism Campaigns More Effective

Published: February 18, 2025

by Satpal Chana, Sonia Patel, Gaziz Zhotabayev and Marcell Ferencz

Summary

VisitBritain transformed its market research by using AI to extract actionable insights from unstructured survey data at scale.
Using Databricks Mosaic AI, VisitBritain applied LLMs for automated sentiment analysis, relevance scoring, and topic classification.
Integrating Databricks' AI-driven platform, VisitBritain future-proofed its research capabilities to adapt to evolving travel trends.

Introduction

VisitBritain is the official website for tourism to the United Kingdom, designed to help visitors plan their trips and get recommendations on top destinations, both historic and modern. The VisitBritain team faced new challenges after the COVID-19 pandemic changed how and why people chose to visit the UK. Other macro trends like climate change (hotter summer temperatures) and demographics (increased life expectancy) were also impacting travel forecasting. VisitBritain knew they needed to stay up to date and adapt their approaches to meet the changing needs of travelers. Working with Redkite (an Accenture company) the answer became clear: implementing data and AI tools would let them pivot quickly - and effectively.

Primary Research Provides Crucial Insights

Primary research from traveler surveys expands understanding of traveler sentiment beyond mobility data (footfalls), spending data (credit card companies), and hotel and flight information that requires an inferential leap to understand the reasons behind why people travel. Traditional surveys from third-party agencies often overlook valuable insights by focusing on pre-coded, multiple-choice responses instead of open-ended answers. However, open-ended free text data presents a new analysis challenge.

At VisitBritain, we wanted to increase the number of tourists using our services. We rely on advertising campaigns to engage and inspire visitors. To evaluate campaign impact, we conduct market research that generates vast volumes of free-text responses from tourists. Historically, extracting insights from these responses has been an incredibly manual and lengthy process; often, the insights arrive too late to have any impact on current campaigns. It is also not a consistent, impartial process. Responses in multiple languages add an extra layer of complexity due to the translation process. The end result is a continual struggle to gain nuanced perspectives and sentiments from respondents to our surveys.

We needed a solution that could streamline this analysis process and improve our understanding of tourist sentiment so we could bolster campaign-related decision-making while weeding out non-informative responses.

An AI Agent System to the Rescue

To address the challenge on hand, we utilized the power of “Viewpoint,” our bespoke enterprise data intelligence platform, with Databricks Mosaic AI which used several large language models (LLMs) such as OpenAI GPT-4 instead of natural language processing (NLP) tools. We did this for three main reasons:

Time to deploy: LLMs are more likely to work out of the box and less reliant on specialist skillsets
Reusability: LLMs can naturally extend to other use cases that involve text analytics
Summarization: LLMs are better at accurately summarizing the intended meaning of the input text

Next, we prepped the data by translating it (as necessary) and filtering out low-quality responses. In a typical survey of 1900 visitors, we asked 7 free-text questions, received 27K free-text answers, filtered out any responses labeled “poor” or “useless” and kept responses labeled “excellent” or “vague”. For example, a response received in German that said “Mir fallt nichs ein” was first translated to “I can’t think of anything” and then graded as useless.

For the 48% of responses we kept, we used the LLM to then examine sentiment, emotion, and topics mentioned. The model graded sentiment as positive or negative, classified the emotional content of the response, and then classified the topic into one of three pre-defined categories. Finally, the LLM graded the topics by prevalence within the responses. We then fed the scores into gold-level tables within Databricks Medallion architecture. We found that some of the most useful data came from critical responses. For example, a response that mentioned the high cost of an activity indicated that we should include more messaging around value in future advertising. We used few-shot prompting to derive relevance scoring and sentiment polarity, using the different LLMs we assigned to these tasks. Finally, we asked the LLMs to create topic-level and campaign-level summaries of the responses.

Looking Back and Looking Ahead with Databricks

To evaluate the results of our AI agent system, we had three primary options:

Human-in-the-loop: A manual review of the LLM’s output to see if it is accurate. This method is effective but costly.
LLM-as-a-judge: Evaluate responses at scale with another LLM, then test that judge LLM on a sample dataset to see if the results are satisfactory.
Exact match: Responses are compared to a labeled, ground truth dataset that must be matched based on a “good enough” metric such as 90% accuracy.

Other than relevancy scoring and summarization, we primarily relied on LLM as a judge for our evaluation metrics. We had a training dataset that we used as a source of ground truth as we were developing and testing different functionalities. Once we were happy with the initial results, we would then compare them to a registered model on the test dataset so we weren't overfitting to our ground truth data. At one point, we hit a plateau in terms of the quality of responses. We then went back and reviewed our ground truth dataset, which had relied on human-in-the-loop review, and found some inconsistencies, so we went back and made some corrections on how we were reviewing responses based on insights from our LLMs.

We began our data transformation journey about two years ago; we had a strong vision of where we wanted our data to be and how we wanted to use it. We evaluated several data architectures to see what would best support our needs. Ultimately, we selected Databricks due to the strength of their future roadmap. We had confidence that any relevant features we might need would be available in Databricks in the future. This confidence was well-placed, as we were able to quickly deploy our GenAI-based data thermometer. We also appreciated the modular, open source approach of Databricks which made our development and evaluation process much easier.

Digging into our current architecture, we store data and rely on Unity Catalog to enable permission-based access so users can query production data from development environments. MLflow integrated into Databricks lets us easily compare LLM results side by side and use LLM as a judge as a low-code way to evaluate data at scale.

We have seen some unexpected value from this project; for example, other teams are able to leverage this proof of concept to evaluate responses to other surveys. Another benefit has been the ability to improve our survey process. Now, when people submit responses outside of a drop-down list, we are able to gain information from their free-text responses that help us shape more pertinent questions going forward. Looking ahead, the fact that Databricks is at the forefront of innovation is key. For example, we can easily switch between model endpoints. This allows us to iterate on the latest and greatest GenAI technology, helping us to support the needs of the tourism industry in the UK—now and in the future.