Skip to main content

Step-by-Step Guide: AI-Powered Customer Sentiment Analysis

How to use AI functions in Databricks SQL together with AI/BI Dashboards
Share this post

Summary

In this blog post, we’ll follow a step-by-step guide to building a review analysis pipeline. We’ll leverage the Databricks Marketplace to import sample data, use ai_query() to create a pipeline of reviews and serve that analysis in an AI/BI dashboard. The final result is an interactive way for users to understand customer opinions from product reviews.

Analyzing data from free-form text reviews can provide critical insights into customer feedback. What do customers think about your business? What particular aspects can be improved? 

Traditionally, understanding large volumes of unstructured text data like user reviews involves a significant investment. You really need ML engineers to train and deploy classification and/or named entity recognition models that are purpose-built for each task. With AI Functions, however, Databricks is making it possible for anyone comfortable working in SQL to get answers to these questions. No bespoke modeling is required. It’s as simple as writing a few lines of SQL.

 

opinion mining review dashboard

 

In this blog post, we’ll follow a step-by-step guide for a SQL analyst to understand trends in freeform text reviews. We’ll leverage the Databricks Marketplace to import sample data, then use ai_query() to create a pipeline of reviews. We’ll then display and share that analysis using an AI/BI dashboard. The final result is an interactive way for users to understand customer opinions from product reviews.

 

Step-by-step process

Let’s walk through the process of mining opinions in Databricks SQL

  1. Importing data from the Databricks Marketplace
  2. Build an opinion mining pipeline
  3. Authoring a dashboard

Step 1: Importing data from the Databricks Marketplace

For this example, we’ll use a sample Amazon reviews dataset from BrightData, one of the partners on the Databricks Marketplace. To access this data, navigate to the Databricks Marketplace via the left-hand navigation menu.

Databricks Marketplace

Databricks Marketplace is an open marketplace for all your data, analytics and AI, powered by the open source Delta Sharing standard. The Databricks Marketplace expands your opportunity to deliver innovation and advance all your analytics and AI initiatives. You can leverage these assets directly within your Databricks environment.

 

To find the dataset, go to the 'Search for products' search box and type in 'amazon - best seller products'. Click the first tile to see details about the dataset and provider.

 

Databricks Marketplace with Amazon best seller products

 

From there, click the blue 'Get instant access’ button on the top right corner and accept the terms and conditions. 

Amazon reviews in Databricks Marketplace

 

Instant access will add a shared dataset to Unity Catalog. Within Unity Catalog, click the 'amazon_reviews’ table to see some example reviews.

Access in Unity Catalog

 

Step 2: Building an opinion mining pipeline

Databricks AI Functions provide an easy way to perform various analytical tasks that would be impossible using traditional SQL. The below options are currently supported with out-of-the-box functionality:

 

These functions are helpful in quickly testing out AI capabilities on top of text data. For instance, we can call ai_analyze_sentiment to find the review sentiment.

   

This will assign a positive, negative, neutral, or mixed sentiment to the text. For example, here is a review that was classified as positive:

 

For more customized use cases, Databricks recommends using ai_query(). ai_query() enables users to call any Databricks Model Serving model, whether a foundation, custom, or external model. ai_query enables prompt customization and better throughput performance when a Provisioned Throughput endpoint is used.

 

Set up a provisioned throughput endpoint

We’ll first set up a provisioned throughput endpoint for our use case to ensure optimal throughput. If you don’t have permissions to set up an endpoint, you may need to request help from your workspace admin with these next steps. Navigate to the Serving tab in the navigation menu on the left-hand side. Click the Create serving endpoint button in the top right corner to create a new endpoint.

provisioned throughput endpoint

On the setup page, we’ll fill out the relevant fields. Name the endpoint llama-v3_1_70b-pt. For entity, choose system.ai.meta_llama_v3_1_70b_instruct. Because we’re prototyping our solution under advanced configuration, we’ll check the box for Scale to zero to ensure that the endpoint scales down to 0 tokens/second if it's not in use. For production use, we recommend unchecking this option to ensure low-latency responses.

 

create serving endpoint

 

 

Create the endpoint to begin provisioning. This may take some time to set up, so we’ll work on creating our query itself. Navigate to the SQL Editor and create a new blank draft.

 

Review the sample data

Now, let’s examine the sample data, specifically the REVIEW_TEXT column. Below is an example review from our data:

 

Define SQL Query

Let’s decide what we want in the output. Ideally, we could extract the critical opinions, record the sentiment, and classify the mention into a relevant category. The response might look like this:

 

To produce this output, we’ll construct a SQL query that uses ai_query() to call an LLM. We’ll create a custom prompt that instructs the LLM to classify reviews into our desired categories and provide a sample review and output to improve the quality of the response.

 

Copy and paste this into the SQL editor. Once the model serving endpoint from the previous step has finished provisioning, we’ll run the above SQL query on ten reviews to see the results.

 

run query and see results

 

Optimize the process

Now that we know this works, we can make a few improvements.

  1. Extract the opinions from JSON into columns.
  2. Filter out any hallucinated values that don’t match our desired classifications.
  3. Create a materialized view to store these results so we don’t have to recompute the LLM outputs each time we want to look at our dashboard. You can refresh the materialized view whenever you want to incrementally compute more review opinions without needing to rescan the entire dataset.

To implement these improvements, copy and paste the following code into a new query.

 

Execute this query. It may take some time to complete, so make a cup of coffee or tea until the query completes.

 

Step 3: Building a dashboard

Finally, we want to display these results on a shared dashboard. We’ll create the dashboard below.

 

review dashboard

 

 

Create a new dashboard

Let’s create a new AI/BI dashboard by clicking on +New > Dashboard.

 

new dashboard menu

 

Double-click the dashboard title to give it a name - we’ll call it Reviews Dashboard.

 

add dashboard canvas

 

Choose the materialized view

Next, let's pull data from the materialized view we created. Go to the Data tab and click Select a table. Find the materialized view you created and import it into the dashboard.
 

materialized view results
Results from the materialized view

 

Create visualizations

Now that we have the dataset loaded, we can create some visualizations. We’ll want to answer the following questions:

  • What is the distribution of sentiment across different products?
  • What product attributes are considered positive? What attributes are negative?
  • Show me which reviews are considered positive vs. negative.

 

We’ll start by adding a Text Widget. Go back to the Canvas tab and add a Text Box, then place the Text Widget at the top of the canvas. Use Markdown to provide a header text widget to our dashboard—#Review Dashboard.

 

text widget

 

Now, we’re going to add some visualizations. Let’s add a row of KPIs across the top of the dashboard to show critical metrics. click the Add a visualization tool in the bottom toolbar, and place a widget on a 1-wide by 3-tall grid. Choose Counter as the visualization type from the right-hand dropdown menu, and select count(*) as the Value. Name this widget Total Opinions.

 

Clone this first counter and place it to the right. Rename it Positive. Next, in the Filter fields menu, add a filter on Sentiment and choose Positive as the value.

 

Continue doing this to get a row of 6 widgets across the top of the dashboard for these KPIs:

  • Total Opinions
  • Positive
  • Negative
  • Mixed
  • Neutral
  • Avg Rating

 

You can optionally style these counters to make them look more appealing.
 

add row of counters
The row of counters for our review dashboard

 

Next, we’re going to add three more charts to our dashboard:

  • A bar chart showing sentiment per classification.
    • X-axis: COUNT(*)
    • Y axis: Classification
    • Color:  Sentiment
    • Layout (in X-axis kebab): Grouped
  • A table showing raw results to review the actual text, with the following fields:
    • Classification
    • Comment
    • Sentiment
    • PRODUCT_NAME
    • RATING
    • AUTHOR_NAME
    • REVIEW_ID
    • REVIEW_TEXT
  • A heatmap showing the distribution of sentiment per product
    • X-axis: PRODUCT_NAME
    • Y axis: Sentiment
    • Color:  COUNT(*)

 

visualizations added to the dashboard
Visualizations added to the dashboard

 

Finally, we will add filters to slice our dashboard by different dimensions. Place three filter widgets at the top of Canvas. Set each filter widget to Multiple Values for filter type. On filter 1, select the PRODUCT_NAME field to filter on. For filter 2, select Sentiment. For filter 3, select Classification. Optionally rename the filters to have a more friendly name.

add filters to the dashboard
Adding filters to our dashboard

 

We’ve finished updating our dashboard! To see the final result, click the “Publish” icon at the top right corner of the dashboard. From there, you can share your dashboard with additional users to ensure anyone can see your review insights.

 

publish dashboard buttons
The publish button in the top right corner

 

Conclusion

Unstructured text data like customer reviews provide a treasure trove of data about your business. Previously, analyzing text required advanced knowledge of NLP techniques. Using AI Functions and AI/BI dashboards, SQL users can seamlessly create engaging dashboards without building an AI or ML model. You can extend this pattern to any unstructured text analysis in SQL - think classification, entity extraction, translation, and more.

 

Thank you for reading along. Our team would love to hear from you if you have any questions or comments. We can’t wait to see what you do with AI functions and AI/BI dashboards!

Get started with sentiment analysis

Learn more about batch inference by reading the blog post here. Also, check out other AI functions for applying AI to your data from within SQL.

 

To learn more about Databricks SQL, visit our website. You can also check out the product tour for Databricks SQL. Suppose you want to migrate your existing warehouse to a high-performance, serverless data warehouse with a great user experience and lower total cost. In that case, Databricks SQL is the solution — try it for free.

Try Databricks for free

Related posts

What's new with Databricks SQL, October 2024

We are excited to share the latest features and performance improvements that make Databricks SQL simpler, faster, and more affordable than ever. Databricks...

Introducing AI Functions: Integrating Large Language Models with Databricks SQL

With all the incredible progress being made in the space of Large Language Models, customers have asked us how they can enable their...

What’s New in AI/BI Dashboards - Fall ‘24

November 7, 2024 by Richard Tomlinson in
Introduction Databricks AI/BI Dashboards have made significant strides since we announced their General Availability. Built on Databricks SQL and powered by Data Intelligence...
See all Platform Blog posts