The advancement of cloud data warehouses that cost-effectively store and process data is changing the way organizations manage their analytics data. The shift from on-premises physical servers to cloud warehouses is causing a change from ETL to ELT.

The previous paradigm was primarily based on ELT: extracting, loading, and transforming data to be stored. It is the process that a data pipeline uses to replicate data from a source system to a destination system, such as a data warehouse in the cloud. The steps can be summarized as follows:

  • Extraction – This first step is to copy data from the source system.
  • Loading – During this step, the pipeline replicates the data from the source to the target system, which can be a data warehouse or a data lake.
  • Transformation – Once the data is on the target system, organizations can run the transformations they need. The data is transformed raw. In this way, different users can use it with various tools or processes.

ELT is a modern variation on the earlier extract, transform, and load process. The transformations happen before the data is loaded—running transformations before the load phase offers a more streamlined data replication process.

ETL tools require computational and processing capabilities to run transformations before loading data to a target. In ELT, organizations use the processing engines at the targets to efficiently transform data within the target system itself. Eliminating this intermediate step speeds up the data loading process.

ELT stands for extract, load, and transform and is a process where data is used through a data warehouse to perform fundamental transformations. That means there is no need to retain the data in a database after use. ELT uses cloud-based data warehouses for different types of data, including structured, unstructured, semi-structured, and even raw data types.

The ELT process is also generally incorporated into the use of data lakes. These data calls refer to types of data stores that, unlike OLAP data stores, allow the storage of any kind of structured or unstructured data. Data lakes do not require you to transform your data before loading and will enable any type of raw data to be loaded immediately, regardless of the format or lack of it.

However, data cleansing, enrichment, and transformation occur after data is loaded. However, data transformation is still necessary before analyzing the data with a business intelligence platform. Some of the characteristics of ELT processes and data lakes are:

  • ELT is a relatively new technology emerging from the implementation of cloud-based server technologies. Cloud-based data warehouses offer nearly infinite storage capacities and scalable processing power.
  • Using ELT in conjunction with a data lake allows you to ingest an ever-expanding set of raw data immediately as it becomes available. You do not need to transform the data into a unique format before saving it to the data lake.
  • An ELT process transforms only the data required for a particular analysis. Although it can slow down the data analysis process, it is more flexible because you can change your data in different ways on the fly.
  • It is important to note that ELT tools and systems are still being developed, so they do not have the same level of maturity as ETL paired with an OLAP database.

Since the ETL process transforms the data before the upload stage, it is the ideal process when a destination requires a specific data format. An example of such a situation is when there is a mismatch in supported data types. For example, the source system supports Parquet files, and the destination only accepts CSV. There is limited ability to scale processing on a destination rapidly, or security restrictions make it impossible to store raw data on a destination.

 

Satori logo2 white