Automating data testing with CI pipelines, using Github Actions

Worried about introducing data pipeline bugs, regressions, or introducing breaking changes? Then this post is for you. In this post, you will learn what CI is, why it is crucial to have data tests as part of CI, and how to create a CI pipeline that automatically runs data tests on pull requests using Github Actions.

How to choose the right tools for your data pipeline

So you know how it can be overwhelming to choose the right tools for your data pipeline? What if you knew the core components involved in any data pipeline and can always pick the right tools for your data pipeline? Now you can! Use this framework to choose the best tool for your data pipeline.

Setting up end-to-end tests for cloud data pipelines

Worried about setting up end-to-end tests for your data pipelines? Wondering if they are worth the effort? Then, this post is for you. In this post, we go over some techniques to set up end-to-end tests. We will also see which components to prioritize while testing.

How to improve at SQL as a data engineer

Are you disappointed with online SQL tutorials that aren't deep enough? Are you frustrated knowing that you are missing SQL skills, but can't quite put your finger on it? This post is for you. In this post, we go over a few topics that can take your SQL skills to the next level and help you be a better data engineer.

6 Responsibilities of a Data Engineer

Unclear data engineering job description ? Wondering what responsibilities falls within a data team ? Then this post is for you. In this post we go over the 6 key responsibilities of a data engineer. The number of these responsibilities that you may end up handling depends on your company and team. Teams in smaller companies generally handle all 6 responsibilities, whereas larger sized companies may have individual(or multiple) teams handling one(or a mix) of these responsibilities.

6 Key Concepts, to Master Window Functions

In this post, we go over 6 key concepts to help you master window functions. Window functions are one the most powerful features of SQL, they are very useful in analytics and performing operations that cannot be done easily with the standard group by, subquery and filters. Despite this, window functions are not used frequently. If you have ever thought 'window functions are confusing', then this post is for you.

What are Common Table Expressions(CTEs) and when to use them?

You have heard of Common Table Expressions(CTEs), but are not be sure what they are and when to use them. What if you knew exactly what Common Table Expressions(CTEs) were and when to use them? In this post, we go over what CTEs are, and their performance comparisons against subqueries, derived tables, and temp tables to help decide when to use them.

Whats the difference between ETL & ELT?

This post goes over what the ETL and ELT data pipeline paradigms are. It tries to address the inconsistency in naming conventions and how to understand what they really mean. Finally ends with a comparison of the 2 paradigms and how to use these concepts to build efficient and scalable data pipelines.

How to add tests to your data pipelines

Trying to incorporate testing in a data pipeline? This post is for you. In this post, we go over 4 types of tests to add to your data pipeline to ensure high-quality data. We also go over how to prioritize adding these tests, while developing new features.

10 Skills to Ace Your Data Engineering Interviews

Preparing for a data engineering interview and are overwhelmed by all the tools and concepts?. Then this post is for you, in this post we go over the most common tools and concepts you need to know to ace your data engineering interviews.