Profile Picture

Start Data Engineering

  • Home
  • Courses
  • Free Email Course
  • Newsletter
  • YouTube
  • Posts
  • Tags
  • Contact Us

    Posts

  • Using Joins and Group Bys the right way for data warehousing Jun 10, 2025
  • CTEs(Common Table Expression) or Temporary Tables for Spark SQL Jun 7, 2025
  • Advanced SQL is knowing how to model the data & get there effectively Jun 3, 2025
  • Data Engineering Interview Preparation Series #3: SQL May 5, 2025
  • How to Extract Data from APIs for Data Pipelines using Python Apr 14, 2025
  • How to create an SCD2 Table using MERGE INTO with Spark & Iceberg Apr 5, 2025
  • How to quickly deliver data to business users? #1. Adv Data types & Schema evolution Mar 18, 2025
  • How to Manage Upstream Schema Changes in Data Driven Fast Moving Company Mar 1, 2025
  • Visual Studio Code (VSCode) extensions for data engineers Feb 16, 2025
  • Should Data Pipelines in Python be Function based or Object-Oriented (OOP)? Feb 10, 2025
  • How to turn a 1000-line messy SQL into a modular, & easy-to-maintain data pipeline? Feb 3, 2025
  • How to ensure consistent metrics in your warehouse Jan 28, 2025
  • Data Engineering Interview Preparation Series #2: System Design Jan 20, 2025
  • How to reference a seed from a different dbt project? Dec 19, 2024
  • What do Snowflake, Databricks, Redshift, BigQuery actually do? Nov 22, 2024
  • 25 SQL tips to level up your data engineering skills Oct 17, 2024
  • How to use nested data types effectively in SQL Oct 14, 2024
  • How to decide on a data project for your portfolio Sep 23, 2024
  • How to build a data project with step-by-step instructions Sep 18, 2024
  • What are the Key Parts of Data Engineering? Sep 5, 2024
  • Data Engineering Interview Preparation Series #1: Data Structures and Algorithms Aug 13, 2024
  • How to implement data quality checks with greatexpectations Jul 26, 2024
  • What are the types of data quality checks? Jul 16, 2024
  • SQL or Python for Data Transformations? Jul 1, 2024
  • Why use Apache Airflow (or any orchestrator)? Jun 24, 2024
  • Data Engineering Projects Jun 14, 2024
  • Data Engineering Project for Beginners - Batch edition Jun 12, 2024
  • Build Data Engineering Projects, with Free Template Jun 11, 2024
  • Python Essentials for Data Engineers May 30, 2024
  • dbt(Data Build Tool) Tutorial May 29, 2024
  • Building Cost Efficient Data Pipelines with Python & DuckDB May 28, 2024
  • Enable stakeholder data access with Text-to-SQL RAGs May 21, 2024
  • How to reduce your Snowflake cost May 9, 2024
  • How to test PySpark code with pytest Apr 22, 2024
  • Docker Fundamentals for Data Engineers Apr 22, 2024
  • Data Engineering Best Practices - #2. Metadata & Logging Feb 22, 2024
  • Uplevel your dbt workflow with these tools and techniques Dec 13, 2023
  • What is an Open Table Format? & Why to use one? Nov 14, 2023
  • 6 Steps to Avoid Messy Data in Your Warehouse Oct 25, 2023
  • Data Engineering Best Practices - #1. Data flow & Code Jul 20, 2023
  • What is a self-serve data platform & how to build one Jun 30, 2023
  • How to become a valuable data engineer Jun 13, 2023
  • Data Engineering Project: Stream Edition May 15, 2023
  • Change Data Capture, with Debezium Feb 15, 2023
  • Data Pipeline Design Patterns - #2. Coding patterns in Python Jan 12, 2023
  • Data Pipeline Design Patterns - #1. Data flow patterns Dec 11, 2022
  • How to gather requirements for your data project Aug 11, 2022
  • 5 Steps to land a high paying data engineering job Jun 24, 2022
  • Setting up a local development environment for python data projects using Docker May 18, 2022
  • What is the difference between a data lake and a data warehouse? Apr 12, 2022
  • End-to-end data engineering project - batch edition Mar 18, 2022
  • Automating data testing with CI pipelines, using Github Actions Feb 22, 2022
  • How to choose the right tools for your data pipeline Dec 12, 2021
  • Setting up end-to-end tests for cloud data pipelines Nov 11, 2021
  • How to improve at SQL as a data engineer Oct 22, 2021
  • 6 Responsibilities of a Data Engineer Oct 12, 2021
  • 6 Key Concepts, to Master Window Functions Oct 12, 2021
  • What are Common Table Expressions(CTEs) and when to use them? Oct 12, 2021
  • Whats the difference between ETL & ELT? Oct 12, 2021
  • How to add tests to your data pipelines Oct 12, 2021
  • 10 Skills to Ace Your Data Engineering Interviews Oct 11, 2021
  • What is a staging area? Oct 5, 2021
  • What is a Data Warehouse? Oct 3, 2021
  • How to Scale Your Data Pipelines Sep 16, 2021
  • Understand & Deliver on Your Data Engineering Task Aug 29, 2021
  • 4 Key Patterns to Load Data Into A Data Warehouse Aug 17, 2021
  • How to Validate Datatypes in Python Jul 21, 2021
  • Designing a Data Project to Impress Hiring Managers Jun 25, 2021
  • How to make data pipelines idempotent May 13, 2021
  • Writing memory efficient data pipelines in Python Apr 26, 2021
  • How to gather requirements to re-engineer a legacy data pipeline Apr 8, 2021
  • How to trigger a spark job from AWS Lambda Mar 27, 2021
  • How to set up a dbt data-ops workflow, using dbt cloud and Snowflake Feb 28, 2021
  • Apache Superset Tutorial Feb 13, 2021
  • How to Join a fact and a type 2 dimension (SCD2) table Feb 7, 2021
  • How to update millions of records in MySQL? Jan 30, 2021
  • How to unit test sql transforms in dbt Jan 16, 2021
  • How to Backfill a SQL query using Apache Airflow Jan 6, 2021
  • How to do Change Data Capture (CDC), using Singer Jan 1, 2021
  • How to Pull Data from an API, Using AWS Lambda Nov 8, 2020
  • How to submit Spark jobs to EMR cluster from Airflow Oct 12, 2020
  • Ensuring Data Quality, With Great Expectations Jul 26, 2020
  • Designing a "low-effort" ELT system, using stitch and dbt Jul 11, 2020
  • 3 Key techniques, to optimize your Apache Spark code Jun 19, 2020
  • What, why, when to use Apache Kafka, with an example Jun 11, 2020
  • A proven approach to land a Data Engineering job Jun 2, 2020
  • What Does It Mean for a Column to Be Indexed May 2, 2020
  • Advantages of Using dbt(Data Build Tool) Apr 25, 2020
  • Apache Airflow Review: the good, the bad Apr 18, 2020
  • Review: Building a Real Time Data Warehouse Apr 11, 2020
  • 3 Key Points to Help You Partition Late Arriving Events Apr 5, 2020
  • Scheduling a SQL script, using Apache Airflow, with an example Mar 29, 2020
  • 10 Key skills, to help you become a data engineer Mar 20, 2020

Top Posts

  1. DBT Tutorial
  2. Data Engineering Project: Batch Edition
© StartDataEngineering 2025 · All rights reserved CC BY-SA 4.0 Privacy Policy

Land your dream Data Engineering job!

Overwhelmed by all the concepts you need to learn to become a data engineer? Have difficulty finding good data projects for your portfolio? Are online tutorials littered with sponsored tools and not foundational concepts?

Learning data engineer can be a long and rough road, but it doesn't have to be!

Pick up any new tool/framework with a clear understanding of data engineering fundamentals. Demonstrate your expertise by building well-documented real-world projects on GitHub.

Sign up for my free DE-101 course that will take you from basics to building data projects in 4 weeks!

    We won't send you spam. Unsubscribe at any time.
    Built with Kit