Extract Transform Load [ETL]

Glossary Item
« Back to Glossary Index
Source Databricks

ETL stands for Extract-Transform-Load and it refers to the process used to collect data from numerous disparate databases, applications and systems, transforming the data so that it matches the target system’s required formatting and loading it into a destination database.

ETL (extract transform load)

How ETL works?

Extract

The first step of this process is extracting data from the target sources that could include an ERP, CRM, Streaming sources, and other enterprise systems as well as data from third-party sources.

There are different ways to perform the extraction:

Three Data Extraction methods:

  1. Partial Extraction – The easiest way to obtain the data is if the if the source system notifies you when a record has been changed
  2. Partial Extraction- with update notification – Not all systems can provide a notification in case an update has taken place; however, they can point those records that have been changed and provide an extract of such records.
  3. Full extract – There are certain systems that cannot identify which data has been changed at all. In this case, a full extract is the only possibility to extract the data out of the system. This method requires having a copy of the last extract in the same format so you can identify the changes that have been made.

Transform

Next, the transform function converts the raw data that has been extracted from the source server. As it cannot be used in its original form in this stage it gets cleansed, mapped and transformed, often to a specific data schema,  so it will meet operational needs.

This process entails several transformation types that ensure the quality and integrity of data; below are the most common as well as advanced transformation types that prepare data for analysis:

  •       Basic transformations:
  •       Cleaning
  •       Format revision
  •       Data threshold validation checks
  •       Restructuring
  •       Deduplication
  •       Advanced transformations:
  •       Filtering
  •       Merging
  •       Splitting
  •       Derivation
  •       Summarization
  •       Integration
  •       Aggregation
  •       Complex data validation

Stage –Data is not usually loaded directly into the target data warehouse, but it is common to have it uploaded into a staging database. This step ensures a quick roll back in case something does not go as planned. During this stage, you have the possibility to generate audit reports for regulatory compliance, or diagnose and repair any data issues.

Load

Finally, the load function s is the process of writing converted data from a staging area to a target database, which may or may not have previously existed. Depending on the requirements of the application, this process may be either quite simple or intricate.

« Back to Glossary Index