Profile Picture

Start Data Engineering

  • Home
  • Courses
  • Free Email Course
  • Newsletter
  • YouTube
  • Posts
  • Tags
  • Contact Us

    Posts

  • How to quickly set up a local Spark development environment? Aug 5, 2025
  • Using Joins and Group Bys the right way for data warehousing Jun 10, 2025
  • CTEs(Common Table Expression) or Temporary Tables for Spark SQL Jun 7, 2025
  • Advanced SQL is knowing how to model the data & get there effectively Jun 3, 2025
  • Data Engineering Interview Preparation Series #3: SQL May 5, 2025
  • How to Extract Data from APIs for Data Pipelines using Python Apr 14, 2025
  • How to create an SCD2 Table using MERGE INTO with Spark & Iceberg Apr 5, 2025
  • How to quickly deliver data to business users? #1. Adv Data types & Schema evolution Mar 18, 2025
  • How to Manage Upstream Schema Changes in Data Driven Fast Moving Company Mar 1, 2025
  • Visual Studio Code (VSCode) extensions for data engineers Feb 16, 2025
  • Should Data Pipelines in Python be Function based or Object-Oriented (OOP)? Feb 10, 2025
  • How to turn a 1000-line messy SQL into a modular, & easy-to-maintain data pipeline? Feb 3, 2025
  • How to ensure consistent metrics in your warehouse Jan 28, 2025
  • Data Engineering Interview Preparation Series #2: System Design Jan 20, 2025
  • How to reference a seed from a different dbt project? Dec 19, 2024
  • What do Snowflake, Databricks, Redshift, BigQuery actually do? Nov 22, 2024
  • 25 SQL tips to level up your data engineering skills Oct 17, 2024
  • How to use nested data types effectively in SQL Oct 14, 2024
  • How to decide on a data project for your portfolio Sep 23, 2024
  • How to build a data project with step-by-step instructions Sep 18, 2024
  • What are the Key Parts of Data Engineering? Sep 5, 2024
  • Data Engineering Interview Preparation Series #1: Data Structures and Algorithms Aug 13, 2024
  • How to implement data quality checks with greatexpectations Jul 26, 2024
  • What are the types of data quality checks? Jul 16, 2024
  • SQL or Python for Data Transformations? Jul 1, 2024
  • Why use Apache Airflow (or any orchestrator)? Jun 24, 2024
  • Data Engineering Projects Jun 14, 2024
  • Data Engineering Project for Beginners - Batch edition Jun 12, 2024
  • Build Data Engineering Projects, with Free Template Jun 11, 2024
  • Python Essentials for Data Engineers May 30, 2024
  • dbt(Data Build Tool) Tutorial May 29, 2024
  • Building Cost Efficient Data Pipelines with Python & DuckDB May 28, 2024
  • Enable stakeholder data access with Text-to-SQL RAGs May 21, 2024
  • How to reduce your Snowflake cost May 9, 2024
  • How to test PySpark code with pytest Apr 22, 2024
  • Docker Fundamentals for Data Engineers Apr 22, 2024
  • Data Engineering Best Practices - #2. Metadata & Logging Feb 22, 2024
  • Uplevel your dbt workflow with these tools and techniques Dec 13, 2023
  • What is an Open Table Format? & Why to use one? Nov 14, 2023
  • 6 Steps to Avoid Messy Data in Your Warehouse Oct 25, 2023
  • Data Engineering Best Practices - #1. Data flow & Code Jul 20, 2023
  • What is a self-serve data platform & how to build one Jun 30, 2023
  • How to become a valuable data engineer Jun 13, 2023
  • Data Engineering Project: Stream Edition May 15, 2023
  • Change Data Capture, with Debezium Feb 15, 2023
  • Data Pipeline Design Patterns - #2. Coding patterns in Python Jan 12, 2023
  • Data Pipeline Design Patterns - #1. Data flow patterns Dec 11, 2022
  • How to gather requirements for your data project Aug 11, 2022
  • 5 Steps to land a high paying data engineering job Jun 24, 2022
  • Setting up a local development environment for python data projects using Docker May 18, 2022
  • What is the difference between a data lake and a data warehouse? Apr 12, 2022
  • End-to-end data engineering project - batch edition Mar 18, 2022
  • Automating data testing with CI pipelines, using Github Actions Feb 22, 2022
  • How to choose the right tools for your data pipeline Dec 12, 2021
  • Setting up end-to-end tests for cloud data pipelines Nov 11, 2021
  • How to improve at SQL as a data engineer Oct 22, 2021
  • 6 Responsibilities of a Data Engineer Oct 12, 2021
  • 6 Key Concepts, to Master Window Functions Oct 12, 2021
  • What are Common Table Expressions(CTEs) and when to use them? Oct 12, 2021
  • Whats the difference between ETL & ELT? Oct 12, 2021
  • How to add tests to your data pipelines Oct 12, 2021
  • 10 Skills to Ace Your Data Engineering Interviews Oct 11, 2021
  • What is a staging area? Oct 5, 2021
  • What is a Data Warehouse? Oct 3, 2021
  • How to Scale Your Data Pipelines Sep 16, 2021
  • Understand & Deliver on Your Data Engineering Task Aug 29, 2021
  • 4 Key Patterns to Load Data Into A Data Warehouse Aug 17, 2021
  • How to Validate Datatypes in Python Jul 21, 2021
  • Designing a Data Project to Impress Hiring Managers Jun 25, 2021
  • How to make data pipelines idempotent May 13, 2021
  • Writing memory efficient data pipelines in Python Apr 26, 2021
  • How to gather requirements to re-engineer a legacy data pipeline Apr 8, 2021
  • How to trigger a spark job from AWS Lambda Mar 27, 2021
  • How to set up a dbt data-ops workflow, using dbt cloud and Snowflake Feb 28, 2021
  • Apache Superset Tutorial Feb 13, 2021
  • How to Join a fact and a type 2 dimension (SCD2) table Feb 7, 2021
  • How to update millions of records in MySQL? Jan 30, 2021
  • How to unit test sql transforms in dbt Jan 16, 2021
  • How to Backfill a SQL query using Apache Airflow Jan 6, 2021
  • How to do Change Data Capture (CDC), using Singer Jan 1, 2021
  • How to Pull Data from an API, Using AWS Lambda Nov 8, 2020
  • How to submit Spark jobs to EMR cluster from Airflow Oct 12, 2020
  • Ensuring Data Quality, With Great Expectations Jul 26, 2020
  • Designing a "low-effort" ELT system, using stitch and dbt Jul 11, 2020
  • 3 Key techniques, to optimize your Apache Spark code Jun 19, 2020
  • What, why, when to use Apache Kafka, with an example Jun 11, 2020
  • A proven approach to land a Data Engineering job Jun 2, 2020
  • What Does It Mean for a Column to Be Indexed May 2, 2020
  • Advantages of Using dbt(Data Build Tool) Apr 25, 2020
  • Apache Airflow Review: the good, the bad Apr 18, 2020
  • Review: Building a Real Time Data Warehouse Apr 11, 2020
  • 3 Key Points to Help You Partition Late Arriving Events Apr 5, 2020
  • Scheduling a SQL script, using Apache Airflow, with an example Mar 29, 2020
  • 10 Key skills, to help you become a data engineer Mar 20, 2020

Top Posts

  1. DBT Tutorial
  2. Data Engineering Project: Batch Edition
© StartDataEngineering 2025 · All rights reserved CC BY-SA 4.0 Privacy Policy

Land your dream Data Engineering job with my free book!

Build data engineering proficiency with my free book!

Are you looking to enter the field of data engineering? And are you

> Overwhelmed by all the concepts/jargon/frameworks of data engineering?

> Feeling lost because there is no clear roadmap for someone to quickly get up to speed with the essentials of data engineering?

Learning to be a data engineer can be a long and rough road, but it doesn't have to be!

Imagine knowing the fundamentals of data engineering that are crucial to any data team. You will be able to quickly pick up any new tool or framework.

Sign up for my free Data Engineering 101 Course. You will get

✅ Instant access to my Data Engineering 101 e-book, which covers SQL, Python, Docker, dbt, Airflow & Spark.

✅ Executable code to practice and exercises to test yourself.

✅ Weekly email for 4 weeks with the exercise solutions.

Join now and get started on your data engineering journey!

    Testimonials:

    I really appreciate you putting these detailed posts together for your readers, you explain things in such a detailed, simple manner that's well organized and easy to follow. I appreciate it so so much!
    I have learned a lot from the course which is much more practical.
    This course helped me build a project and actually land a data engineering job! Thank you.

    ​

    When you subscribe, you'll also get emails about data engineering concepts, development practices, career advice, and projects every 2 weeks (or so) to help you level up your data engineering skills. We respect your email privacy.