5 SQL Concepts Every Data Engineer Should Know

List of key SQL concepts every data engineer should know

List of key SQL concepts every data engineer should know
TECHNICAL UPSKILL
BEST PRACTICE
REAL WORLD
Author
Published

June 18, 2026

Warehouse transformation patterns

  • TL;DR: Facts(events) and dimensions (business entity)
  • Joins are used to enrich data(left join), or get complete data belonging to the same business entity(inner join), or get any data belonging to the same business entity (full outer join)
  • Group by is used to condense numerical information into easy-to-understand numbers (aka metrics)
  • UNION/UNION ALL is used to bring together data belonging to the same entity from different systems
  • Window functions are used when you need to compare values between rows of your tables

Types of data writing

  • insert/append: adds to the table
  • overwrite: deletes existing data and inserts new data
  • update: changes existing data
  • MERGE/Upsert: one command to both update and /or insert rows as per logic

Systematic data processing with CTEs

  • Understand your inputs: grain, business unit/event, relationship type (1:1, 1:many, many:many, many:1), completeness of inputs
  • define what the output should be ^ + metrics and their calculation
  • CTE: filter & group by data to get all the necessary parts in the same grain join the necessary parts and group by (if necessary) midnful of duplication/ incorrect grain when group by ing to compute metrics
  • link: cte v temp

Optimizations

  • narrow and wide tx: in mem proc & data shuffle
  • paritition: partition pruning
  • sort: range filter
  • bucketing: partition for high cardinality cols
  • join strategy: use broadcast join
  • query plan: data execution in UI, how to read query plan and quickly identify bottleneck

4 Key Sql Transformation Patterns for Data Engineers

  • When & Which JOINS
  • When & How GROUP BY
  • Use case for UNIONS
  • Problems requiring Window Functions

4 Data Write Patterns and When to Use Them

  • Insert

  • Overwrite

  • Delete

  • Update

  • Mix: Update, Delete, Insert

Writing Clean Sql With CTEs

  • Input understanding
  • Defining output schema and metrics
  • CTE as a step-by-step process to go from Inputs -> Output

Optimizing Query Performance by Pre-Processing It

  • Partition Store
  • Bucketing
  • Sorting

Optimizing Data Processing by Reducing Shuffle and Scan

  • Broadcast Join
  • Storage Partition Join
Back to top