Advantages of Using dbt(Data Build Tool)

In this article we aim to go over the reasoning behind why someone might want to use dbt. If you are interested in running dbt check out this article dbt tutorial. Some common questions from Data Engineers about dbt are

it is not very clear to me why would I use dbt instead of running SQL queries on Airflow

Why would I switch from sql scripts to dbt scripts considering the learning curve?

These are valid questions. Engineers usually evaluate a tool based on the functionality and flexibility it provides, not always the ease of use and learning curve of the tool. dbt falls in the latter category, it is designed to solve for the T part of ETL, by working on raw data already present in a data warehouse. It provides less functionality compared to other OSS ETL orchestration tools such as Airflow, Luigi, But this comes with the advantage of dbt being extremely simple to understand and run compared to other OSS ETL orchestration tools especially for a non engineer.

DBT ref: dbt

In recent years, Data warehouses have become extremely flexible(UDFs,etc) and powerful, with features like separation of storage and processing, elastic scaling and Machine Learning capabilities(Bigquery’s ML). This has led many companies to use the data warehouse to perform the data transformation and load part of the ETL process (otherwise know as ELT). This is where dbt shines as it provides an easy, version controlled way of writing transformations using just SQL. Additionally, it also provides data quality check natively.

The key points, on why someone would want to use dbt are

  1. Easy to use for non engineers (shared data knowledge between engineering and non engineering teams)

  2. Extremely flexible data model (recreate data easily, backfills are easy)

  3. If most of your transformations are at a data warehouse level, this tool makes it extremely easy to do

  4. Built in testing for data quality

  5. Online, searchable data catalog and lineage

  6. Reusable macros

  7. Shockingly low learning curve

  8. Production run using dbt cloud or through Airflow trigger.

Conclusion

If you are building a data pipeline where multiple engineers and non engineers are stakeholders in how the data is transformed and you have a powerful data warehouse to support such requirements, dbt is a very competitive choice as it frees you up from having to manage the dependencies, has test support natively and has a very low learning curve enabling engineers and non engineers to contribute to the transformation logic.

If you are interested in learning how to setup and run dbt checkout this article dbt tutorial. Let me know if you have any questions or comments in the comments section below.