Whats the difference between ETL & ELT?

This post goes over what the ETL and ELT data pipeline paradigms are. It tries to address the inconsistency in naming conventions and how to understand what they really mean. Finally ends with a comparison of the 2 paradigms and how to use these concepts to build efficient and scalable data pipelines.
Author

Joseph Machado

Published

October 12, 2021

Keywords

data warehouse, ETL, ELT

1. Introduction

If you are a student, analyst, engineer, or anyone working with data pipelines, you would have heard of ETL and ELT architecture. If you have questions like

What is the difference between ETL & ELT?

Should I use ETL or ELT pattern for my data pipeline?

Then this post is for you. In this post, we go over the definitions and differences between ETL and ELT.

2. E-T-L definition

ETL refers to extract, transform and load.

  1. Extract: The process of getting the data from the source system. E.g., a python process to get data from an API, access data from an OLTP database, etc.
  2. Transform: The process of transforming the extracted data. E.g., changing field types & names, applying business logic to the data set, enriching data, etc.
  3. Load: The process of loading the transformed data into the data asset used by the end-user.

3. Differences between ETL & ELT

Traditionally ETL has been used to refer to any data pipeline where data is pulled from the source, transformed, and loaded into the final table for use by the end-user. The transformation could be in python, Spark, Scala, SQL in the data warehouse, etc.

Recently, ELT has been used to refer to data pipelines where the data is transformed in the data warehouse. This can be confusing since, by definition, this is an ETL pipeline with transformation done using SQL in the data warehouse. What people mean when they say ETL and ELT:

  1. ETL: The raw data is stored in some file storage (s3, local, etc), transformed with a python/spark/scala or other non-sql languages, and loaded into the tables to be used by the end-user.
  2. ELT: The raw data is loaded into the data warehouse and transformed using SQL into the final table to be used by the end-user.
ETL ELT
ETL ELT

Let’s compare ETL and ELT.

Criteria Notes ELT ETL
Running cost Depending on your data warehouse, performing transformation using k8s tasks or serverless functions(lambdas) can be much cheaper compared to transforming data in your data warehouse. ✔️
Engineering expertise Both require a good grasp of distributed systems, coding, maintenance, debugging, and SQL. ✔️ ✔️
Development time Depending on existing tools this may vary. If you have tools like Fivetran or dbt, ELT is a breeze. ✔️
Transformation capabilities Programming languages/Frameworks like python, scala, and Spark enable complex transformations (enrich with data from external API, running an ML model, etc). SQL is not as powerful. ✔️
Latency between data generation and availability of use by end-user In general, the T of ELT is run as a batch job since SQL does not support streaming transformation. ETL can do batching, micro-batching, or streaming. ✔️
SAAS tools ELT allows for faster data feature delivery due to the availability of EL tools (Fivetran, Airbyte, etc). ✔️

Note that the above comparison is for a simple data pipeline and can change depending on your specific data pipeline. Most companies use a mix of ETL & ELT approaches.

You may find data pipelines where the data is transformed using a framework like Apache Spark, loaded into a warehouse table, and transformed further using SQL in your data warehouse.

4. Conclusion

Hope this article clears up what people mean when they say ETL or ELT. The next time you are hit with this jargon, remember ELT is used to refer to a data pipeline where data is transformed using SQL in your data warehouse. ETL refers to any data pipeline that involves moving data from one system to another. When designing a data pipeline, use the criteria shown below to determine if you want to use an ELT approach.

  1. Development cost
  2. Running cost
  3. Engineering expertise
  4. Transformation requirements
  5. Latency requirements

If you have any questions or comments, please leave them in the comment section below.

5. Further reading

  1. What is a data warehouse
  2. Window functions
  3. Common table expressions
  4. Memory efficient data pipelines in python
  5. How to add tests to your data pipeline
Back to top

Land your dream Data Engineering job with my free book!

Build data engineering proficiency with my free book!

Are you looking to enter the field of data engineering? And are you

> Overwhelmed by all the concepts/jargon/frameworks of data engineering?

> Feeling lost because there is no clear roadmap for someone to quickly get up to speed with the essentials of data engineering?

Learning to be a data engineer can be a long and rough road, but it doesn't have to be!

Imagine knowing the fundamentals of data engineering that are crucial to any data team. You will be able to quickly pick up any new tool or framework.

Sign up for my free Data Engineering 101 Course. You will get

✅ Instant access to my Data Engineering 101 e-book, which covers SQL, Python, Docker, dbt, Airflow & Spark.

✅ Executable code to practice and exercises to test yourself.

✅ Weekly email for 4 weeks with the exercise solutions.

Join now and get started on your data engineering journey!

    Testimonials:

    I really appreciate you putting these detailed posts together for your readers, you explain things in such a detailed, simple manner that's well organized and easy to follow. I appreciate it so so much!
    I have learned a lot from the course which is much more practical.
    This course helped me build a project and actually land a data engineering job! Thank you.

    When you subscribe, you'll also get emails about data engineering concepts, development practices, career advice, and projects every 2 weeks (or so) to help you level up your data engineering skills. We respect your email privacy.