Step-by-Step Guide: AI-Powered Customer Sentiment Analysis
Summary
In this blog post, we’ll follow a step-by-step guide to building a review analysis pipeline. We’ll leverage the Databricks Marketplace to import sample data, use ai_query() to create a pipeline of reviews and serve that analysis in an AI/BI dashboard. The final result is an interactive way for users to understand customer opinions from product reviews.
Analyzing data from free-form text reviews can provide critical insights into customer feedback. What do customers think about your business? What particular aspects can be improved?
Traditionally, understanding large volumes of unstructured text data like user reviews involves a significant investment. You really need ML engineers to train and deploy classification and/or named entity recognition models that are purpose-built for each task. With AI Functions, however, Databricks is making it possible for anyone comfortable working in SQL to get answers to these questions. No bespoke modeling is required. It’s as simple as writing a few lines of SQL.
In this blog post, we’ll follow a step-by-step guide for a SQL analyst to understand trends in freeform text reviews. We’ll leverage the Databricks Marketplace to import sample data, then use ai_query() to create a pipeline of reviews. We’ll then display and share that analysis using an AI/BI dashboard. The final result is an interactive way for users to understand customer opinions from product reviews.
Step-by-step process
Let’s walk through the process of mining opinions in Databricks SQL.
- Importing data from the Databricks Marketplace
- Build an opinion mining pipeline
- Authoring a dashboard
Step 1: Importing data from the Databricks Marketplace
For this example, we’ll use a sample Amazon reviews dataset from BrightData, one of the partners on the Databricks Marketplace. To access this data, navigate to the Databricks Marketplace via the left-hand navigation menu.
Databricks Marketplace is an open marketplace for all your data, analytics and AI, powered by the open source Delta Sharing standard. The Databricks Marketplace expands your opportunity to deliver innovation and advance all your analytics and AI initiatives. You can leverage these assets directly within your Databricks environment.
To find the dataset, go to the 'Search for products' search box and type in 'amazon - best seller products'. Click the first tile to see details about the dataset and provider.
From there, click the blue 'Get instant access’ button on the top right corner and accept the terms and conditions.
Instant access will add a shared dataset to Unity Catalog. Within Unity Catalog, click the 'amazon_reviews’ table to see some example reviews.
Step 2: Building an opinion mining pipeline
Databricks AI Functions provide an easy way to perform various analytical tasks that would be impossible using traditional SQL. The below options are currently supported with out-of-the-box functionality:
- ai_analyze_sentiment
- ai_classify
- ai_extract
- ai_fix_grammar
- ai_gen
- ai_mask
- ai_similarity
- ai_summarize
- ai_translate
These functions are helpful in quickly testing out AI capabilities on top of text data. For instance, we can call ai_analyze_sentiment to find the review sentiment.
This will assign a positive, negative, neutral, or mixed sentiment to the text. For example, here is a review that was classified as positive:
For more customized use cases, Databricks recommends using ai_query(). ai_query() enables users to call any Databricks Model Serving model, whether a foundation, custom, or external model. ai_query enables prompt customization and better throughput performance when a Provisioned Throughput endpoint is used.
Set up a provisioned throughput endpoint
We’ll first set up a provisioned throughput endpoint for our use case to ensure optimal throughput. If you don’t have permissions to set up an endpoint, you may need to request help from your workspace admin with these next steps. Navigate to the Serving tab in the navigation menu on the left-hand side. Click the Create serving endpoint button in the top right corner to create a new endpoint.
On the setup page, we’ll fill out the relevant fields. Name the endpoint llama-v3_1_70b-pt. For entity, choose system.ai.meta_llama_v3_1_70b_instruct. Because we’re prototyping our solution under advanced configuration, we’ll check the box for Scale to zero to ensure that the endpoint scales down to 0 tokens/second if it's not in use. For production use, we recommend unchecking this option to ensure low-latency responses.
Create the endpoint to begin provisioning. This may take some time to set up, so we’ll work on creating our query itself. Navigate to the SQL Editor and create a new blank draft.
Review the sample data
Now, let’s examine the sample data, specifically the REVIEW_TEXT column. Below is an example review from our data:
Define SQL Query
Let’s decide what we want in the output. Ideally, we could extract the critical opinions, record the sentiment, and classify the mention into a relevant category. The response might look like this:
To produce this output, we’ll construct a SQL query that uses ai_query() to call an LLM. We’ll create a custom prompt that instructs the LLM to classify reviews into our desired categories and provide a sample review and output to improve the quality of the response.
Copy and paste this into the SQL editor. Once the model serving endpoint from the previous step has finished provisioning, we’ll run the above SQL query on ten reviews to see the results.
Optimize the process
Now that we know this works, we can make a few improvements.
- Extract the opinions from JSON into columns.
- Filter out any hallucinated values that don’t match our desired classifications.
- Create a materialized view to store these results so we don’t have to recompute the LLM outputs each time we want to look at our dashboard. You can refresh the materialized view whenever you want to incrementally compute more review opinions without needing to rescan the entire dataset.
To implement these improvements, copy and paste the following code into a new query.
Execute this query. It may take some time to complete, so make a cup of coffee or tea until the query completes.
Step 3: Building a dashboard
Finally, we want to display these results on a shared dashboard. We’ll create the dashboard below.
Create a new dashboard
Let’s create a new AI/BI dashboard by clicking on +New > Dashboard.
Double-click the dashboard title to give it a name - we’ll call it Reviews Dashboard.
Choose the materialized view
Next, let's pull data from the materialized view we created. Go to the Data tab and click Select a table. Find the materialized view you created and import it into the dashboard.
Create visualizations
Now that we have the dataset loaded, we can create some visualizations. We’ll want to answer the following questions:
- What is the distribution of sentiment across different products?
- What product attributes are considered positive? What attributes are negative?
- Show me which reviews are considered positive vs. negative.
We’ll start by adding a Text Widget. Go back to the Canvas tab and add a Text Box, then place the Text Widget at the top of the canvas. Use Markdown to provide a header text widget to our dashboard—#Review Dashboard.
Now, we’re going to add some visualizations. Let’s add a row of KPIs across the top of the dashboard to show critical metrics. click the Add a visualization tool in the bottom toolbar, and place a widget on a 1-wide by 3-tall grid. Choose Counter as the visualization type from the right-hand dropdown menu, and select count(*) as the Value. Name this widget Total Opinions.
Clone this first counter and place it to the right. Rename it Positive. Next, in the Filter fields menu, add a filter on Sentiment and choose Positive as the value.
Continue doing this to get a row of 6 widgets across the top of the dashboard for these KPIs:
- Total Opinions
- Positive
- Negative
- Mixed
- Neutral
- Avg Rating
You can optionally style these counters to make them look more appealing.
Next, we’re going to add three more charts to our dashboard:
- A bar chart showing sentiment per classification.
- X-axis: COUNT(*)
- Y axis: Classification
- Color: Sentiment
- Layout (in X-axis kebab): Grouped
- A table showing raw results to review the actual text, with the following fields:
- Classification
- Comment
- Sentiment
- PRODUCT_NAME
- RATING
- AUTHOR_NAME
- REVIEW_ID
- REVIEW_TEXT
- A heatmap showing the distribution of sentiment per product
- X-axis: PRODUCT_NAME
- Y axis: Sentiment
- Color: COUNT(*)
Finally, we will add filters to slice our dashboard by different dimensions. Place three filter widgets at the top of Canvas. Set each filter widget to Multiple Values for filter type. On filter 1, select the PRODUCT_NAME field to filter on. For filter 2, select Sentiment. For filter 3, select Classification. Optionally rename the filters to have a more friendly name.
We’ve finished updating our dashboard! To see the final result, click the “Publish” icon at the top right corner of the dashboard. From there, you can share your dashboard with additional users to ensure anyone can see your review insights.
Conclusion
Unstructured text data like customer reviews provide a treasure trove of data about your business. Previously, analyzing text required advanced knowledge of NLP techniques. Using AI Functions and AI/BI dashboards, SQL users can seamlessly create engaging dashboards without building an AI or ML model. You can extend this pattern to any unstructured text analysis in SQL - think classification, entity extraction, translation, and more.
Thank you for reading along. Our team would love to hear from you if you have any questions or comments. We can’t wait to see what you do with AI functions and AI/BI dashboards!
Get started with sentiment analysis
Learn more about batch inference by reading the blog post here. Also, check out other AI functions for applying AI to your data from within SQL.
To learn more about Databricks SQL, visit our website. You can also check out the product tour for Databricks SQL. Suppose you want to migrate your existing warehouse to a high-performance, serverless data warehouse with a great user experience and lower total cost. In that case, Databricks SQL is the solution — try it for free.