Skip to main content
CUSTOMER STORY

Shifting the car buying experience into high gear

Edmunds makes it easier to find quality dealers with Databricks and GenAI

3–5

Hours saved per week with review auto-moderation

2

Number of moderators now needed to assess 300+ reviews

CLOUD: AWS

“Databricks empowers us to develop cutting-edge generative AI solutions efficiently — without sacrificing data security or governance.”

— Greg Rokita, Vice President of Technology, Edmunds

A trusted guide in online car shopping, Edmunds is always looking for ways to deliver insightful content to make the car buying experience even better. Edmunds embraced generative AI capabilities to revolutionize their approach to identifying and moderating “dealer quality of service” reviews. By employing a GenAI model, they’ve streamlined the process, allowing for the automatic parsing and analysis of hundreds of daily reviews for quicker online publication. This transition marks a significant departure from the previously manual method, saving valuable staff time and enabling them to focus on more essential tasks. With the Databricks Data Intelligence Platform, Edmunds can now match the highest-quality dealers with the right customers, facilitating smarter purchasing decisions each and every time.

Manual moderation and data governance overhead cause delays in review publication

With over 300 daily reviews submitted on both the quality of new and used cars and the dealers selling them, the moderators at Edmunds faced challenges in making sure prospective buyers had the information they needed to make an informed car purchasing decision in a timely fashion.

“Dealer service reviews were being moderated manually and then published to the site on approval. This would take days to comb through. We were looking to see if we could use GenAI to parse through and auto-moderate such reviews,” explained Suresh Narasimhan, Technical Consultant on the API platform team at Edmunds. The hands-on process of sifting through all reviews and moderating for the best reviews was time-consuming, sometimes requiring up to 72 hours of turnaround time to publish vetted reviews. The team wanted to rev up the process, but switching to a generative AI-powered solution wasn’t as simple as shifting gears to Drive. The solution needed to be trained to discern ambiguous reviews and determine if they were specifically meant for “dealer quality of service.” In addition, Edmunds needed to streamline the governance of data pipelines used in their content moderation system.

Implementing a GenAI solution to automate reviews 

Originally, Narasimhan dedicated time and resources to training an off-the-shelf model to deliver accurate, performant outputs. “The results were not great,” Narasimhan put it simply. “The rules to moderate the reviews were complex, and even fine-tuning did not deliver the results we needed. I had to capture all the rules in the prompt to achieve the desirable result, so there was no flexibility for edge cases.” He then experimented with prompt engineering of off-the-shelf models but found it challenging to compare the outputs of different models.

Narasimhan turned to Databricks to experiment with alternatives, specifically Databricks Model Serving, which consolidates widely used third-party LLM providers in the same view as custom-served models within a unified environment where users can easily manage permissions and set rate limits. “Using Databricks, it became very easy to switch between commercially available models and compare the results to see which one worked better.”

Concluded Narasimhan, “What we are using now is GPT-4 that is called through Databricks Model Serving endpoints with a lot of custom prompts on how to moderate it, and that has worked the best for us.” Their custom prompt instructions direct the model whether to accept or reject a review — in a matter of seconds, not hours.

Resolving data governance overhead with the Data Intelligence Platform

Edmunds Staff Engineer Sam Shuster shared his pain regarding data governance. “Having to use IAM roles to govern access to data resulted in a lot of overhead in IAM for very coarse access. And unless we did an extensive search in GitLab and Slack, we had little insight into the dependencies of our pipelines.” This was before Edmunds turned to the Databricks Data Intelligence Platform to power their generative AI workloads. To solve data governance issues, Shuster’s team decided to migrate to Databricks Unity Catalog in place of using their existing workspaces. They used external tables for the majority of their important pipelines, so they created metadata sync scripts to keep these tables in sync with Unity Catalog — all while not having to worry about keeping the actual data in sync themselves. After Shuster’s team migrated some of the core pipelines, the rest of the teams at Edmunds could switch to the new Unity Catalog cluster policies at their convenience over the course of a year.

“With Unity Catalog, the ability to manage both table and even Amazon S3 access more like a traditional database allows us to have much finer-grained access control than what we had before,” said Shuster. “We also have more documented lineage for our pipelines and an account-level metastore. All this granularity is why we migrated off of Hive metastore.”

Faster review moderation, improved data quality

By enabling the auto-moderation of dealer reviews, Narasimhan estimates that the team saves three to five hours per week. With GenAI models accessed through Model Serving, Edmunds moderators can now analyze and publish new reviews in minutes — a far cry from a three-day turnaround. And they now need only two moderators, freeing up these resources for more valuable tasks.

Migrating to Unity Catalog has also been successful. “Since adopting Unity Catalog, we have seen many quality improvements in our environment. These benefits have included improved auditing, compliance, security, reduction of operational overhead and improved data discovery,” added Shuster. “From a security perspective, Unity Catalog has allowed us to greatly simplify how we control data access while also delivering better access controls. Being able to programmatically query for lineage has meant fewer incidents due to changes in pipelines breaking downstream jobs, as well as better documentation for our users.”

Full speed ahead on generative AI

Greg Rokita, VP of Technology at Edmunds, believes that generative AI will continue to influence the business — with Databricks continuing to play a role. “Instead of treating data warehousing and AI/ML as separate tasks with different systems, Databricks lets us see them as two sides of the same coin. Traditionally, data warehouses store past data, while AI models predict future data. Databricks unifies these by creating a single timeline that includes both historical information and forecasts.”

Buoyed by the resounding success of this initial implementation, Edmunds is now set to expand this AI-driven approach across all their reviews.