3 Key techniques, to optimize your Apache Spark code

This post covers key techniques to optimize your Apache Spark code. You will know exactly what distributed data storage and distributed data processing systems are, how they operate and how to use them efficiently. Go beyond the basic syntax and learn 3 powerful strategies to drastically improve the performance of your Apache Spark project.

Advantages of Using dbt(Data Build Tool)

In this article we aim to go over the reasoning behind why someone might want to use dbt. If you are interested in learning dbt checkout this article . Some common questions from Data Engineers about dbt are

it is not very clear to me why would I use dbt instead of running SQL queries on Airflow

Review: Building a Real Time Data Warehouse

Many data engineers coming from traditional batch processing frameworks have questions about real time data processing systems, like

“What kind of data model did you implement, for real-time processing?”