Skip to main content
CUSTOMER STORY

Deriving meaningful insights for research and policy impact

Vanderbilt University uses Databricks to enhance AI-driven academic research

2.7x

Increase in research citations

~8PB

In data volume

INDUSTRY: Public Sector
CLOUD: AWS

The Jean and Alexander Heard Libraries at Vanderbilt University provide access to a world of resources that academic institutions use to conduct research and teach in the classroom. The Heard Libraries struggled with providing access to vast datasets like their Vanderbilt Television News Archive, limiting the ability of faculty and researchers to apply advanced analytical tools like natural language processing (NLP) to extract insights and understand context without the content. To help institutions access, analyze and understand context from TV broadcasts and other data at scale, they chose the Databricks Data Intelligence Platform. Vanderbilt University has since transformed their delivery of historical information, enabling secure and efficient access to digitized collections of media — allowing for the synthesis of historical content for decision-making in today’s world and the ability to preserve it for future generations.

Lack of advanced analytics makes surfacing culture impossible

The Jean and Alexander Heard Libraries at Vanderbilt University are revolutionizing how research is conducted by transforming library services into a highly interactive and intelligent hub. This initiative aims to provide seamless access to a world of resources for teaching and research, fostering an environment where collaboration and innovation thrive. Vanderbilt’s ethos as exemplified by the libraries, termed “radical collaboration,” is to bring together curated data, advanced AI tools and data expertise to support and enhance research activities.

Hosting collections of televised network news programs since 1968 alongside millions of journals from major publishers, the Jean and Alexander Heard Libraries at Vanderbilt University receive about 15 million uses of their resources annually. In the academic sphere, conducting comprehensive and multidisciplinary research often encounters significant technological hurdles. At Vanderbilt University, the Heard Libraries faced challenges in harnessing the power of their extensive Television News Archive to support diverse research needs. Scholars sought to run their natural language processing tools on the data, but the Heard Libraries did not have the capacity — or the full-text transcripts — to allow this level of analysis.

One prominent use case involved researching the impact of presidential executive orders through historical news coverage. This research required not only access to accurate and extensive datasets but also the ability to integrate these with other datasets such as polling data and public policy documents. Before Databricks, accessing and manipulating this large-scale data was cumbersome, with limited capabilities to handle complex queries that spanned multiple data types and periods of time.

Vanderbilt’s researchers wanted to compare TV news coverage with external data sources, like stock market trends or climate change policies, to explore broader socioeconomic trends. These sophisticated research queries demanded a robust, scalable platform capable of managing vast data volumes and diverse data formats.

Jim Duran, Director of the Vanderbilt Television News Archive at Vanderbilt University, stated, “With a gift from Suzanne Perot McGee, Patrick K. McGee and their family, we were able to transcribe all of the video content and deploy Databricks to provision secure access to our researchers. This has allowed them to run machine learning research and experiments, pairing the TV News Archive with other datasets to explore complex issues like the effects of executive orders on public perception and policy.”

Using AI to analyze decades of broadcast news at scale

The Databricks Data Intelligence Platform has revolutionized access to and manipulation of large historical datasets. This transition was crucial for facilitating complex research activities, notably those involving natural language processing. By employing NLP techniques, researchers can now analyze vast volumes of data from decades of broadcast news to derive meaningful insights into public discourse and policy impact.

Databricks democratized data access and the use of advanced NLP tools among researchers, including those without deep technical expertise in data science. “The platform’s intuitive interface allows researchers to focus more on the discovery, exploration and analysis of their research topics rather than the underlying technical complexities,” explained Vanderbilt University Librarian Jon Shaw.

“Databricks has been instrumental in transforming how we manage our TV News Archive. The platform not only allows for efficient data ingestion and processing but also enables our researchers to integrate and analyze data in ways that were previously not possible,” added Duran. “This has opened up new avenues for research and has significantly enhanced our ability to support interdisciplinary studies.”

Databricks supports the Heard Libraries’ need for secure and efficient data management, crucial for maintaining data integrity and confidentiality, through features such as Unity Catalog for governance and Delta Live Tables for automatic ingestion of new data. These tools facilitate seamless collaboration among researchers and support the integration of diverse data sources, which is essential for NLP.

Democratizing AI with Databricks for 2.7x more credibility

With Databricks, Vanderbilt’s libraries have overcome initial data access and analysis challenges and greatly expanded their research capabilities, allowing for more complex, data-driven projects. This has reinforced the Heard Libraries’ position as a pivotal academic research and innovation hub.

The libraries have reported a dramatic increase in the usage of their resources. Before implementation of the Databricks Platform, the Vanderbilt Television News Archive was utilized by thousands. Shaw now believes it’s poised to engage hundreds of thousands. “Databricks has allowed for as much Vanderbilt scholarship to be as open access as possible,” Shaw said. “Now, Vanderbilt is being cited approximately 2.7 times more often than before.” This underscores the broader impact of Vanderbilt’s credibility on the global academic community, enhancing the dissemination of knowledge across disciplines and borders.

Looking ahead, Shaw and Duran are galvanized to leverage the Databricks AI/ML suite to augment the data even further. “One of the greatest goals would be to label an hourlong Fox News, MSNBC or CNN program with news stories, commercials, speakers, locations and even topics for better analysis,” described Duran. 

“We want to apply AI to start picking up individual emotions to create even more context,” added Shaw excitedly. “We have this 50-year history of an archive that is only expanding. With Databricks, we have great possibilities for a richer dataset we can preserve in perpetuity over time.”

The Vanderbilt Television News Archive is just the first of many such efforts that will take place. The larger vision is to evolve how the Heard Libraries will support research innovation by providing both curated data for researchers and a consolidated data and AI platform. Through this innovative approach, Vanderbilt University is setting a new standard for how libraries can support research innovation, providing both the tools and expertise needed to fuel academic discovery and collaboration.