Advanced SQL is knowing how to model the data & get there effectively

1. Introduction

Most data engineering job descriptions these days expect “knowledge of advanced SQL,” but ask any data engineer that question, and you will get a different answer every time.

Are you

Frustrated that “advanced SQL” ebooks or Udemy courses aren’t all that advanced! 

Wondering where you can learn more about writing advanced SQL queries?

Frustrated by job descriptions requiring “Advanced SQL”, but no one knows what that means?

It can be especially demotivating when seeking a new data engineering job, as the goalposts for landing a data job are constantly shifting.

Imagine being able to build systems that enable your stakeholders to use data to make decisions efficiently. In this post, you will learn the key advanced SQL techniques, techniques to optimize data processing, and how to structure data in a way that enables ease of use.

Advanced SQL involves knowing how to model data (data modeling) and retrieve it effectively (SQL techniques and query optimization).

2. SQL techniques

The most common techniques that are considered advanced SQL techniques are

  1. WINDOW FUNCTIONS
  2. CTEs
  3. JOIN types
  4. MERGE INTO
  5. EXPLODE CROSS JOIN
  6. AGGREGATES BEYOND STANDARD AGGREGATES

In addition to knowing that these functions exist, it’s also crucial to understand when to use them.

3. Query optimization

The key point with query optimization is to reduce the amount of data to be processed and/or increase the utilization of cluster resources.

  1. Narrow & wide transformations
  2. Partitioning & Clustering
  3. Query planner
  4. Avoid data skew

You need to know how to store data for optimized reads and how to configure your job to use all the available cluster resources.

4. Data modeling & data flow

There are five key types of tables; in almost all cases, following them religiously will solve 99.99% of your data problems.

  1. Dimension tables store data for a business entity (e.g., customer, product, partner, etc). These tables describe the ‘who’ and ‘what’ types of questions. For example, which stores had the highest revenue yesterday? In this question, stores will be the dimension.
  2. Fact tables containing information about how dimensions interact with each other in real life. Example: An order fact is an interaction between a customer and a seller involving one or more products.
  3. Bridge tables are used to represent many-to-many relationships between dimension tables. Example: Account-Customer bridge table where one account can have multiple customers and one customer can have multiple accounts.
  4. One Big Table (OBT) tables are fact tables to which all of its dimension tables are left joined. Eliminates joins and simplifies analysis for business users, albeit at the expense of storage efficiency and increased maintenance complexity. Example: Sales OBT with transaction details plus customer name, product description, and store location all in one table.
  5. Summary/Aggregate tables are pre-calculated tables storing rolled-up metrics at higher levels of granularity than base fact tables. Improves query performance for common reporting patterns by avoiding expensive aggregation calculations at runtime. Monthly sales summary by region instead of querying daily transaction details each time.

Most data teams follow the 3-hop architecture (but every team does it subtly differently): Read this for details on how to use the 3-hop architecture .

5. Conclusion

To recap, we saw:

  1. Advanced SQL techniques
  2. Query optimization techniques
  3. Data modeling and data flow patterns

Learn these for interviews and real-world use cases; it will help you leverage decades of existing research and solve most of the common issues plaguing data teams.

Please let me know in the comment section below if you have any questions or comments.

6. Further reading

  1. 25 SQL techniques
  2. Data flow best practices
  3. Interview preparation series: SQL
  4. Messy SQL -> Modular code
  5. Using nested data in SQL

If you found this article helpful, share it with a friend or colleague using one of the socials below!

Land your dream Data Engineering job!

Overwhelmed by all the concepts you need to learn to become a data engineer? Have difficulty finding good data projects for your portfolio? Are online tutorials littered with sponsored tools and not foundational concepts?

Learning data engineer can be a long and rough road, but it doesn't have to be!

Pick up any new tool/framework with a clear understanding of data engineering fundamentals. Demonstrate your expertise by building well-documented real-world projects on GitHub.

Sign up for my free DE-101 course that will take you from basics to building data projects in 4 weeks!

    We won't send you spam. Unsubscribe at any time.

    Land your dream Data Engineering job!

    Overwhelmed by all the concepts you need to learn to become a data engineer? Have difficulty finding good data projects for your portfolio? Are online tutorials littered with sponsored tools and not foundational concepts?

    Learning data engineer can be a long and rough road, but it doesn't have to be!

    Pick up any new tool/framework with a clear understanding of data engineering fundamentals. Demonstrate your expertise by building well-documented real-world projects on GitHub.

    Sign up for my free DE-101 course that will take you from basics to building data projects in 4 weeks!

    Join now and get started on your data engineering journey!

      Testimonials:

      I really appreciate you putting these detailed posts together for your readers, you explain things in such a detailed, simple manner that's well organized and easy to follow. I appreciate it so so much!
      I have learned a lot from the course which is much more practical.
      This course helped me build a project and actually land a data engineering job! Thank you.

      When you subscribe, you'll also get emails about data engineering concepts, development practices, career advice, and projects every 2 weeks (or so) to help you level up your data engineering skills. We respect your email privacy.