Skip to main content

What Is Data Intelligence?

Data intelligence is the process of using artificial intelligence (AI) systems to learn, understand and reason on an organization’s data, enabling the creation of custom AI applications and democratizing access to data across the enterprise.

Here’s more to explore

Explore the Data Intelligence Platform

Accelerate ETL, data warehousing, BI and AI.

Read now

Bringing Breakthrough Data Intelligence to Industries

How a data intelligence platform democratizes data and AI.

Read the report

Free Training: Databricks Lakehouse Fundamentals

Get up to speed on lakehouse by taking this free on-demand training.

Start now

How does data intelligence work?

Data intelligence works by utilizing both generative AI and traditional AI models to develop a comprehensive understanding of an organization’s enterprise data and its usage. It learns the signals captured from across the organization’s data estate, including its data catalog, SQL queries, BI dashboards, notebooks, data pipelines and documentation. This approach allows for a nuanced understanding of the business’s concepts, semantics and unique data environment. Consequently, the AI can provide significantly more accurate answers compared to the naive use of large language models (LLMs) trained solely on public internet data.

What are the benefits of data intelligence?

Data intelligence offers the following benefits to organizations:

  • Boosts productivity with data and AI through natural language access: Leveraging AI models, data intelligence enables working with data in natural language, tailored to each organization’s jargon and acronyms. The data intelligence observes how data is used in existing workloads to learn the organization’s terms and offers a tailored natural language interface to all users — from nonexperts to data scientists and engineers.
  • Improves semantic cataloging and discovery of data and AI assets: Generative AI can understand each organization’s data model, metrics and KPIs to offer unparalleled discovery features and automatically identify discrepancies in how data is being used.
  • Automates data management and optimization: Data intelligence models can optimize data layout, partitioning and indexing based on data usage, reducing the need for manual tuning and knob configuration.
  • Enhances governance and privacy: Data intelligence allows organizations to automatically detect, classify and prevent misuse of sensitive data while simplifying data management using natural language.
  • Offers first-class support for AI workloads: Data intelligence enhances enterprise AI applications by allowing them to connect to the relevant business data and leverage the semantics learned (e.g., metrics, KPIs) to deliver relevant and accurate results. Using data intelligence, AI application developers no longer have to “hack” intelligence together through brittle prompt engineering.

Use cases for data intelligence

Data intelligence is utilized across various industries, from finance and healthcare to energy, and is transforming the way businesses operate. Here are a few examples that demonstrate how data intelligence helps companies understand their customers, improve processes, detect fraud and more:

  • Finance: This sector uses data intelligence to manage financial risks, predict economic trends and ensure regulatory compliance. Banks and other financial institutions analyze data to assess creditworthiness, identify fraud and categorize customers.
  • Retail and CPG: These industries leverage data intelligence to understand customer preferences, better manage inventory, optimize supply chains and personalize marketing strategies for individual customers.
  • Public Sector: In the public sector, data intelligence is crucial for enhancing services and making informed policy decisions. Government agencies use data to monitor economic changes and improve service delivery.
  • Insurance: Companies in this industry utilize data intelligence to evaluate risks, set insurance premiums and detect fraudulent claims. By analyzing large datasets, they gain a clearer understanding of risks and streamline the claims process.
  • Healthcare: These organizations apply data intelligence to enhance patient care, control costs and conduct research. Data analytics supports medical decision-making and helps identify effective treatments.
  • Energy: In the energy sector, companies use data analysis to monitor and forecast energy usage and improve the efficiency of the power grid.

While data intelligence applications may vary across industries, the common goal remains the same: to extract valuable insights from data and leverage them to drive business growth and enhance customer experiences.

Key technology enabling data intelligence platforms

A data intelligence platform is an architecture that is built on a data lakehouse (combining the best features of data lakes and data warehouses) to provide an open, unified foundation for all data and governance and is powered by a Data Intelligence Engine that understands the uniqueness of an organization’s data. Key technologies enabling the Data Intelligence Platform include:

  1. Open and unified data storage
    • Cloud storage services: Such as Amazon S3, Google Cloud Storage and Azure Data Lake Storage to provide scalable and cost-effective storage
    • Open data formats: Including Delta Lake UniForm and Apache Iceberg, which are open source storage layers that bring ACID transactions to data formats like Parquet, enabling reliable data operations and management
  2. Open metadata and governance services
    • Unity Catalog: Provides open data governance and metadata management for data lakehouses
    • Hive metastore: A central repository that stores metadata for Hive tables and databases, facilitating data discovery and management
  3. Distributed data processing
    • Apache Spark™ and Spark Structured Streaming: A unified analytics engine for large-scale data processing that supports batch and real-time stream processing
  4. Query engines
    • Databricks Photon: A next-generation engine that provides extremely fast query performance at low cost for data ingestion, ETL, streaming, data warehousing, data science and interactive queries — directly on the data lake
  5. Machine learning and MLOps
    • MLflow: An open source platform to manage the ML lifecycle, including experimentation, reproducibility and deployment
    • Mosaic AI: Tools that accelerate the development and deployment of traditional and generative AI models by optimizing and automating machine learning workflows
  6. Compound AI systems
    • Compound AI systems use signals from an organization’s data platform, including the data catalog, dashboards, notebooks, data pipelines and documentation, to create highly specialized and accurate generative AI models that understand the organization’s data, usage patterns and business concepts.
Back to Glossary