Data Warehouse

Glossary Item
« Back to Glossary Index
Source Databricks

A data warehouse is a system that pulls together data derived from operational systems and external data sources within an organization for reporting and analysis.

A data warehouse is a central repository of information that provides users with current and historical decision support information which is difficult to access or present in the traditional operational data store. Its main focus is to provide a correlation between data from different systems, for example, it connects the product inventory stored in one system purchase to the POs issued for a specific customer, that are stored in a different system.

The data warehouse is the core of the Business Intelligence system which is built for data analysis and reporting.

A data warehouse typically resides on servers dedicated to this function running a database management system (DBMS) such as SQL Server. An Extract, Transform, and Load (ETL) software tool such as SQL Server Integration Services (SSIS) is used to obtain data from each appropriate source, including any ERP systems that might be in use. ETL tools pull data from the source systems, they read and edit it, then organize the data in a way that facilitates analysis.

A data warehouse system can also be found under the following names:

  • Decision Support System (DSS)
  • Executive Information System
  • Management Information System
  • Business Intelligence Solution
  • Analytic Application

Why Use A Data Warehouse?

Data warehouses can provide:

  • Consolidate data obtained from many sources; acting as a single point of access for all data, rather than requiring users to connect to dozens or even hundreds of systems individually. e.g., marketing, sales, finance, etc
  • An assurance of data quality
  • Historical intelligence. A data warehouse integrates disparate Source Systems along with Internal and External Data Sets as well as their historical data
    Separates analytics processing from transactional databases, improving the performance of both systems
  • Data quality, consistency, and accuracy. It uses a standard set of semantics around data, for example, consistency in naming conventions, codes for various product types, languages, currencies, and so on
  • Are built around a carefully designed data model that transforms production data from a high-speed data entry design to one that supports high-speed retrieval.
« Back to Glossary Index