Skip to main content
CUSTOMER STORY

Democratize insight with generative AI and knowledge graphs

SOLUTION: Model Training
PLATFORM USE CASE: Mosaic AI

Stardog’s Enterprise Knowledge Graph platform was designed to make it easier for users to access data within their organization. However, not all users have the specialized skills needed to query data. That’s why Stardog turned to Databricks Mosaic AI to develop Voicebox. This innovative solution brings a conversational interface to their knowledge graph platform while removing dependencies on data location and structure. Using large language models fine-tuned with Mosaic AI, Voicebox acts as a conversational AI semantic layer that better connects users with their data through simple natural language prompts. Positive feedback from early-access customers has led Stardog to expand access to Voicebox, empowering all Stardog Cloud users to use plain language to query their enterprise data.

Enterprise data access hampered by skill requirements and dependencies

Data silos and data sprawl are huge challenges for enterprises, resulting in miscommunication, inefficiencies and hurdles to innovation. Stardog developed their knowledge graph technology to resolve these issues by pulling together myriad data sources to give manufacturing, life sciences and financial services enterprises a unified view of their data in its business context. Their partnership with Databricks began on Partner Connect.

“We help customers create a contextualized view of their data stored both in and outside Databricks,” says Evren Sirin, CTO and Co-founder of Stardog. “By ‘contextualized,’ we mean conceptual relationships that tie data into a network of information that means something to business users. This semantic layer is a powerful part of a fabric that includes the Databricks Platform.”

Previously, knowledge graphs required specialized skills to properly access and query data. Stardog has built tools to streamline the process by developing applications such as Stardog Designer, which makes data modeling easier than ever, and Stardog Explorer, which allows any business user to visually explore and query their enterprise data. Meanwhile, generative AI tools have revolutionized how business users can get answers to their questions, creating true self-service analytics.

That’s why Stardog developed Voicebox for their knowledge graph platform. This conversational, AI-driven interface is designed to simplify data access and exploration through the use of natural language prompts. Initially, Stardog experimented internally with OpenAI and other solutions to power Voicebox. The company quickly found that the quality of the outputs was not at the level required for more complex queries. They also recognized that an in-house approach — rather than sending data to a third party model — would be essential. 

“Off-the-shelf open source models aren't that good for the kinds of tasks we want them to do,” says Sirin. “The margin of error is very low for us and our customers, and fine-tuning is a must.” Realizing that they needed to increase the customization of their models to better support their needs, Stardog pivoted toward Mosaic’s AI training platform.

“We quickly learned that if you have a smaller model that you fine-tune for a specific task, you can match or exceed the quality of OpenAI, and you have more control over data security and privacy,” adds Sirin.

Databricks and Mosaic drive generative AI advancements

Already a Databricks partner, Stardog used Mosaic AI Training to develop a customized generative AI model that powers their Voicebox interface and better handles conversational queries — eliminating the learning curve for querying data and sharing it across teams. Now, when users want to access their data, they can utilize a side panel in Voicebox with sample questions to choose from that are based on the type of data they are querying. For example, “List in descending order the average vehicle MSRP by year for the last five years.”

Alternatively, users can also type in their questions conversationally, and get the applicable data back in real time. The large language model (LLM) first identifies the core concepts in a user’s natural language query and then sends it to a vector database to align those terms with actual concepts in the structured data sources. This information is then sent back to the model to display the right output in the knowledge graph. Mosaic AI Training plays a crucial role in this process by improving the quality of the queries being processed. Additionally, Databricks’ version management and zero-downtime deployment capabilities have allowed Stardog to iterate and deploy new versions quickly and effectively, leading to faster innovations.

“We needed a way to serve our ML models without a ton of manual overhead,” adds Sirin. “Databricks has been a huge time-saver in this regard.”

Looking ahead to AI-driven enterprise data access

While Stardog’s Voicebox solution is still in its early days, customers are already excited, and the company has grand plans for it. Thanks to the powerful combination of data and AI tools from Databricks, LLMs are now faster, more secure and more cost-effective to build. Stardog can use a single platform to manage all aspects of the ML lifecycle, which is key to further improving an AI-driven product like Voicebox. Thanks to the capabilities of Mosaic AI Training, Stardog has the tools it needs to make Voicebox significantly more impactful to its enterprise customers.

“It's been super easy to make progress in the Databricks ecosystem. We can see the learning curve of our models and how things change throughout training,” states Sirin. “The visibility into the process has made our development of Voicebox much more efficient and means we’ll be able to bring incredible AI-powered solutions to the market that much faster.”