10 Skills to Ace Your Data Engineering Interview

Introduction

If you are preparing for a data engineering interview and are overwhelmed by all the tools and concepts you need to learn. Then this post is for you, in this post we go over the most common tools and concepts you need to know to ace any data engineering interview.

If you are working on landing a data engineering interview read a proven approach to landing a DE job.

Skills

The skills are ordered based on how frequently they are asked in interviews. The points links to different posts/docs that explain the concept in detail.

1. SQL

SQL is crucial for any data engineering interview. These questions generally involve the interviewer giving you an ERD and asking you to write queries to answer analytical questions. Read about the below concepts and practice leetcode problems to ace this section.

2. Python

Python is the most common language used in building data engineering pipelines. Know the data structures and basics.

3. Data structures and algorithms

This is a standard section for any software job. Read the patterns and practice the common asked problems.

4. Data Modeling

You will be tested on data modeling concepts, such as star schema and will be asked why a certain methodology maybe better in a given scenario. Read the below concepts, understand what they are, why they are needed, when to use them and when not to use them.

4.1 Data Warehousing

4.2 OLTP

5. Data Pipelines

This will be a part of the system design question where they ask you to design a data pipeline and will test the pipeline with potential source system failures, incorrect dependencies, backfilling and check for efficient data modeling for data access by users.

6. Distributed system fundamentals

7. Distributed Queue

8. System Design

This is a standard question for most software jobs. Specifically for a data engineering position, you will be asked about building data pipelines where you will have to understand the source system, ETL schedule requirements, data usage pattern and design a data pipeline to pull data into a warehouse. Some common questions in this section are how do you design a clickstream data storage system, log storage and how would you use CDC pattern to replicate OLTP tables into a data warehouse. Some companies may choose to ask a standard system design question like designing twitter, netflix, etc.

9. Business Value

This is a key point for experienced people. You will need to justify how you work impacted the money/KPI/OKR/..or other metric that drove business value. See the STAR method for how to organize your response to such questions.

10. Cloud computing

Be familiar with what cloud computing is. The interviewer will check if you know some commonly used components such as

You can find corresponding components for GCP or Azure as well.

11. Probabilistic Data structures (Optional)

Although not frequently asked, these are asked sometimes. You may be asked what is an efficient way to count the number of distinct ids in a distributed system where some probability of error is fine.

Optional: Depending on the specific role a JVM language Java or Scala expertise

Interview Preparation Tips

When you are preparing for any interview there are 4 main aspects

  1. Coding skills: Points 1, 2, 3. You will need to practice these problems and be able to write code in your interview.
  2. Core concepts: Points 4 - 9. You will need to know enough to get through an interview. For example if you consider columnar storage format you will need to know what it is, why, when to use it, what its benefits are and when not to use it. But not necessarily the exact implementation details, like what the metadata format in a columnar format such as Apache Parquet is. These are essential to know for the job, but given that an interview is about 1h the probability of getting to such lower level details may be very low.
  3. System Design skills: This will require you to use the concepts you learn from points 4 - 9 and come up with a good solution for the question asked. When faced with a design question make sure to clear any ambiguous requirements before starting to come up with a solution.
  4. Communication skills: When you are taking about your previous experiences use the STAR method to keep you response focussed and clear.

If you are pressed for time use the below to concentrate on high priority skills

  1. Points 1 - 4, 7: Absolutely necessary, almost all levels of Data engineering interviews need this.
  2. Points 5 - 10: If you are interviewing for a position that involves designing ETL systems, modeling data warehouses. You will be tested on this.
  3. Point 11 + JVM: These are good to know, some companies check if you know these topics.

Conclusion

Hope this list helps you ace your data engineering interview. Good luck, if you have any questions or comments leave them in the comment section below.

References

  1. Wikimedia