Profile Picture

Start Data Engineering

  • Home
  • Books
  • Free Email Course
  • News Letter
  • Posts
  • Tags
  • Contact Us

    Posts

  • Change Data Capture, with Debezium Feb 15, 2023
  • Data Pipeline Design Patterns - #2. Coding patterns in Python Jan 12, 2023
  • Data Pipeline Design Patterns - #1. Data flow patterns Dec 11, 2022
  • Build Data Engineering Projects, with Free Template Oct 22, 2022
  • How to gather requirements for your data project Aug 11, 2022
  • 5 Steps to land a high paying data engineering job Jun 24, 2022
  • Setting up a local development environment for python data projects using Docker May 18, 2022
  • Data Engineering Project for Beginners - Batch edition May 11, 2022
  • What is the difference between a data lake and a data warehouse? Apr 12, 2022
  • End-to-end data engineering project - batch edition Mar 18, 2022
  • Automating data testing with CI pipelines, using Github Actions Feb 22, 2022
  • How to choose the right tools for your data pipeline Dec 12, 2021
  • Setting up end-to-end tests for cloud data pipelines Nov 11, 2021
  • How to improve at SQL as a data engineer Oct 22, 2021
  • 6 Responsibilities of a Data Engineer Oct 12, 2021
  • 6 Key Concepts, to Master Window Functions Oct 12, 2021
  • What are Common Table Expressions(CTEs) and when to use them? Oct 12, 2021
  • Whats the difference between ETL & ELT? Oct 12, 2021
  • How to add tests to your data pipelines Oct 12, 2021
  • 10 Skills to Ace Your Data Engineering Interviews Oct 11, 2021
  • What is a staging area? Oct 5, 2021
  • What is a Data Warehouse? Oct 3, 2021
  • dbt(Data Build Tool) Tutorial Sep 29, 2021
  • How to Scale Your Data Pipelines Sep 16, 2021
  • Understand & Deliver on Your Data Engineering Task Aug 29, 2021
  • 4 Key Patterns to Load Data Into A Data Warehouse Aug 17, 2021
  • How to Validate Datatypes in Python Jul 21, 2021
  • Designing a Data Project to Impress Hiring Managers Jun 25, 2021
  • How to make data pipelines idempotent May 13, 2021
  • Writing memory efficient data pipelines in Python Apr 26, 2021
  • How to gather requirements to re-engineer a legacy data pipeline Apr 8, 2021
  • How to trigger a spark job from AWS Lambda Mar 27, 2021
  • How to set up a dbt data-ops workflow, using dbt cloud and Snowflake Feb 28, 2021
  • Apache Superset Tutorial Feb 13, 2021
  • How to Join a fact and a type 2 dimension (SCD2) table Feb 7, 2021
  • How to update millions of records in MySQL? Jan 30, 2021
  • How to unit test sql transforms in dbt Jan 16, 2021
  • How to Backfill a SQL query using Apache Airflow Jan 6, 2021
  • How to do Change Data Capture (CDC), using Singer Jan 1, 2021
  • How to Pull Data from an API, Using AWS Lambda Nov 8, 2020
  • How to submit Spark jobs to EMR cluster from Airflow Oct 12, 2020
  • Data Engineering Project: Stream Edition Sep 26, 2020
  • Ensuring Data Quality, With Great Expectations Jul 26, 2020
  • Designing a "low-effort" ELT system, using stitch and dbt Jul 11, 2020
  • 3 Key techniques, to optimize your Apache Spark code Jun 19, 2020
  • What, why, when to use Apache Kafka, with an example Jun 11, 2020
  • A proven approach to land a Data Engineering job Jun 2, 2020
  • What Does It Mean for a Column to Be Indexed May 2, 2020
  • Advantages of Using dbt(Data Build Tool) Apr 25, 2020
  • Apache Airflow Review: the good, the bad Apr 18, 2020
  • Review: Building a Real Time Data Warehouse Apr 11, 2020
  • 3 Key Points to Help You Partition Late Arriving Events Apr 5, 2020
  • Scheduling a SQL script, using Apache Airflow, with an example Mar 29, 2020
  • 10 Key skills, to help you become a data engineer Mar 20, 2020

Top Posts

  1. DBT Tutorial
  2. Data Engineering Project: Batch Edition
  3. Trigger Spark Jobs from Apache Airflow
  4. How to optimize your spark jobs
  5. Add Tests to Data Pipeline
© StartDataEngineering 2023 ยท All rights reserved CC BY-SA 4.0 Privacy Policy