How to become a valuable data engineer

1. Introduction

So you are a new data engineer (or looking for a DE job) and want to better yourself as a data engineer. However, when you look at job postings or company tech stack, you are overwhelmed by the sheer amount of tools you have to learn! You feel overwhelmed by having to know so many things leading to analysis paralysis. If you are

Feeling lost and confused by data engineering as a whole

Feel like you are just winging it or getting lost every day

Wanting to improve yourself as a DE but don’t know where to start

Then this post is for you. In this post, we will review what you can do to improve your value as a DE. We will see what people say when they mean business impact and how technology helps you drive business impact. Finally, we will go over a simple formula that you can use to choose a high-impact project that brings in significant business impact.

2. Skills

Two main things make you a valuable DE.

2.1. Business Impact

The work that you do as a DE should have an impact on the business. Data teams typically have a KPI/OKR, etc., that you can work towards improving. However, understanding the idea behind your team’s KPI will help you develop efficient ways to improve it.

The two fundamental concepts that help you understand business impact are:

2.1.1. Know your business

You have to understand how the company you work for makes money & what your end-users care about. Some questions to get answers for would be

  1. How many products does your company have?
  2. Who are its customers?
  3. What profit margin is your company operating at? or is it in its Growth phase?
  4. Which teams use the data that you generate?
  5. What do your end-user teams use your data for?
  6. What metrics do your end-user teams care about the most?
  7. What is the business process that generates the data that you serve to your end users?
  8. What are the usual business issues with your upstream sources? E.g., system down, no business rules in the app layer, etc

Having a good understanding of the above questions is critical when building projects. Your understanding of the above points will evolve as you work on more projects.

2.1.2. Money & Time

Most data projects either help make (or save) money or save time on future processes.

Making money: Projects of this type aim to increase company revenue (via a new product offering) or save money (by identifying low ROI ads and focussing on higher ROI acquisition channels). These projects typically impact one or more of the critical metrics of your end-user teams.

Saving time: Projects of this type help your end user teams or data team move faster. A project of this type could be anything from building a dashboard that enables end-user to see the metrics vs. them having to calculate it every single time, building data quality systems so that end-users are not caught up in checking the validity of the data (which slows them down), and building data pipeline patterns to help non-DEs to build pipelines.

2.2. Technical skills

While business knowledge helps you determine what and why to work on a project, your technical skills will help you figure out how to work on a project.

While the data engineering ecosystem grows every day, there are a few fundamental concepts that, if you understand, can help you quickly learn any new tech. They are

  1. Data storage: Distributed data storage , partitioning , clustering , column encoding , & table formats .
  2. Data processing: Data shuffling , in memory processing and query planner .
  3. Data modeling: The Data Warehouse Toolkit .
  4. Cloud basics: For a cloud provider understand ways to interact with its cloud storage (S3 ), data warehouse (Redshift ) using its python API (boto3 ).
  5. Data quality patterns: Adding tests to your data pipeline , automating CI testing with GitHub Actions , & setting up end-to-end testing .
  6. Coding patterns: Data engineering design patterns & coding patterns .
  7. Orchestration & scheduling: Apache Airflow concepts .
  8. Alerting & Monitoring: Prometheus concepts .
  9. Data discovery: Datahub concepts .
  10. Data access control/permissions: Snowflake object access control .
  11. Data readers: Dashboard , & API .

Note: You do not need to be an expert in every topic listed above to use them. Expertise will develop over time as you encounter challenges in each project.

When you work on a project, don’t start with the tech; work backward to identify what is necessary from the requirements. Working backward is done by listing the steps required to get the output.

For example, let’s say that you are building a project to deliver a data set to non-technical users. Here’s an example of what working backward may look like:

  1. Understand the requested data and which business processes generate it. You will also need to understand the key metrics that the end-users are looking for in this dataset and how fresh the data will need to be. [Business metrics & process]
  2. Build a dashboard that displays those key metrics for non-technical users to explore the data. [Data reader, Data access]
  3. Build the dataset in your warehouse and ensure it’s modeled properly to avoid slow and expensive queries. [Data modeling, storage, & processing, Data quality]
  4. Bring the required upstream data set(s) into your warehouse. [Know your business, Coding]
  5. Clean and transform the upstream data set(s) as required. [Coding]
  6. Schedule it at a specific frequency (daily, hourly, etc.). [Orchestration & Scheduling]
  7. Figure out what will happen when the pipeline breaks. [Alerting & Monitoring]

The above example shows how a standard project can help you develop expertise in multiple topics.

3. Build impactful projects

The value of a data engineer (or any engineer) is determined by the impact they deliver. But what does impact even mean? This section will review what impact is and the steps you can follow to provide impact.

If your company has the option for you to choose what to work on, use the steps below to find a project that can deliver maximum impact.

  1. Identify your team’s or company’s key metrics (KPI/OKR, etc).
  2. Assign a weight between 1 and 5 to each metric regarding importance. Let’s call this metric_weight.
  3. For each metric, identify or develop a list of projects that can improve it.
  4. For each project,
    1. Calculate the hypothetical metric improvement by doing this project. Let’s call this metric_change_perc.
    2. Identify the time to build the project (in months). Let’s call this timeline_measure.
    3. Calculate impact, as impact_measure = (meric_weight * metric_change_perc)/timeline_measure.
  5. Go to step 3 and repeat until no more metrics are left.
  6. Choose the top 5 projects with the highest impact_measure and work with your manager & team members to identify what to work on.

This system could be better, but using this in conversations with managers/teammates will give you an idea of which projects to concentrate on that will give your company the most impact, thus making you more valuable to the company.

4. Conclusion

To recap, in this post, we saw.

  1. What makes a valuable DE
  2. Importance of understanding business metrics
  3. The technical foundations to understand
  4. Identifying projects that have a high impact

As you build your work experience, remember to add them to your resume following the STAR method to state your impact. If you follow the steps above, you will show significant business impact for your projects, significantly improving your value as a data engineer.

The next time you feel overwhelmed by everything you think you have to learn to be a valuable DE, remind yourself that your value as an engineer is making the company money and helping others save time and work back from there; your career will be fun and impactful!

If you have any questions or comments, please leave them in the comment section below.

5. Further reading

  1. Requirementes gathering for data pipelines
  2. Responsibilities of a data engineer
  3. How to choose your data tools
  4. Understand and deliver on your DE task
  5. Re-engineering legacy data pipelines
  6. 10 skills to ace your DE interview
  7. Bragsheet

Please consider sharing, it helps out a lot!