Skip to main content

AI/BI Genie is a conversational experience for business teams to self-serve insights from their data through natural language. Genie leverages generative AI tailored to an organization’s data, usage patterns, and business concepts and continuously learns from user feedback. This allows non-technical users to ask questions as they would to an experienced coworker, receiving relevant and accurate responses directly from their enterprise data.

With the growing adoption of Genie spaces, it is essential that users have confidence in the accuracy of the insights provided. This assurance is crucial in enabling them to make the most informed decisions based on the insights Genie delivers.

Data practitioners responsible for authoring and maintaining Genie spaces for their business teams commonly cite two critical requirements:

  • The ability to ensure the instructions and examples they maintain within the Genie space effectively improve overall accuracy.
  • When requested, have the ability to verify responses generated by Genie are correct and to communicate that feedback to the end user.

To address these requirements, we are excited to introduce two new features in AI/BI Genie to help build confidence in the accuracy of answers returned:

  1. Benchmarks - Genie authors can now create test questions to track overall accuracy as they update their Genie space’s instructions and settings.
  2. Request Review - End users can now request Genie authors verify or correct responses, and then receive confirmation.

Benchmarks

Benchmarks allow Genie authors to systematically evaluate the accuracy of their Genie spaces. A well-crafted set of benchmark questions should include the most frequently asked user questions, along with 2-3 variations in phrasing. Authors can then run these benchmarks over time to determine whether edits to the space are effectively improving overall accuracy.

How to use Benchmarks

To better assess your Genie space’s accuracy with Benchmarks, follow these steps:

  1. Prepare: Ensure your Genie space includes clean tables and metadata. Start by manually testing a few common user questions and adding instructions to boost baseline accuracy.
  2. Add Benchmarks: The Benchmarks you add should reflect the different phrasings and versions of the common questions your users ask. For example, if your users commonly ask for Top 10 customers by total sales this year, it’d be helpful to benchmark a few versions like “Top 10 customers by revenue FY2024” and “Show me top 10 customers this year by revenue”. You then add a SQL statement that accurately answers your benchmark question. This helps the evaluation function compare Genie’s response to a source of truth for each question.
    Benchmarks
  3. Run Benchmarks + Evaluate: After you’ve built out a representative Benchmarks set, click ‘Run Benchmarks’ to automatically evaluate Genie across all benchmark questions. Each question will receive an assessment label: Correct, or Needs Review. Questions are marked Correct if Genie’s query result exactly matches the benchmark’s query result.
  4. Enhance: Double-click on specific questions to understand where Genie needs improvement. After identifying the specific questions your Genie space struggles with, make improvements to your Genie space. For example, you may discover that you need to add instructions to teach Genie how to calculate “best sales rep in Asia”. You then go to your instructions page and add an example SQL query showing Genie how to answer this question properly.
    Sample Sales Room
  5. Rerun Benchmarks: After improving my space’s Instructions, I would then re-run my Benchmarks set to see if my overall accuracy has increased. You can then track your Genie space’s accuracy over time in the Evaluations tab. Continue to add more Benchmark questions as you see common questions that your end users are asking.
    Rerun Benchmarks

Request Review

Genie is a powerful tool for exploratory data analysis, allowing non-technical users to ask follow-up questions and get new insights from their data without involving expert practitioners. However, just like analysis in other tools like Excel, you may want a second opinion before presenting your findings as factual.

The Request Review feature enables end-users to complete this review cycle directly in Genie—there is no need for screenshots and back-and-forths in Slack or Teams.

How to use Request Review

  1. Click the Request button: When a user receives an answer that they want to verify, they can click the request icon to start a review. It’s recommended they then add a comment explaining their request to the Genie space admin.
    Genie space admin
  2. Admin Review: After a request is sent, Genie space admins can review it on the History page, checking the original prompt, generated SQL, and any attached comments. They can mark the SQL as correct or modify it for the business user.
    Genie Space Admins
  3. Requestor Notified: After the admin verifies or corrects the generated SQL, end users will be notified of this verification. They can then review this in their view of the History page.

Conclusion

With the introduction of Benchmarks and Request Review, AI/BI Genie significantly enhances user confidence in the accuracy and reliability of the answers they receive. Benchmarks allow for systematic tracking of accuracy improvements over time, ensuring that instruction edits are effective. Request Review provides a seamless way for users to verify critical responses, fostering trust in the insights that Genie generates. Together, these new features empower business teams to confidently leverage Genie to make the critical decisions required in their daily work.

We encourage you all to start creating Genie spaces if you haven't already. Make sure to read through our AI/BI Genie documentation. To see AI/BI Dashboards and Genie in action, check out our demo and take the product tour.

The Databricks team is always looking to improve the AI/BI Genie experience, and would love to hear your feedback!

Try Databricks for free

Related posts

See all Platform Blog posts