Visual Studio Code (VSCode) extensions for data engineers

Whether you are setting up visual studio code for your colleagues or want to improve your workflow, tons of extensions are available. If you have wondered > What are the best visual studio code extensions for data engineers? > How do I share my visual studio code environment with my colleagues? > How does Visual Studio code user/workspace/devcontainers/profiles work? Then this post is for you! Imagine being able to quickly set up Visual Studio Code on any laptop exactly how you want it. You wont notice that you are coding on a different machine! In this post, we will go over Visual Studio Codes settings hierarchy, how to set up Visual Studio Code on any machine exactly to your liking with profiles, useful extensions for data engineering, and the caveats of unrestricted extensions. By the end of this post, you will have set up Visual Studio code exactly how you like it and be able to share it with other data engineers. Lets get started.
beginner
visual studio code
IDE
devex
Author

Joseph Machado

Published

February 16, 2025

Keywords

beginner, visual studio code, IDE, devex

1. Introduction

Whether you are setting up visual studio code for your colleagues or want to improve your workflow, tons of extensions are available. If you have wondered

What are the best visual studio code extensions for data engineers?

How do I share my visual studio code environment with my colleagues?

How does Visual Studio code user/workspace/devcontainers/profiles work?

Then this post is for you!

Imagine being able to quickly set up Visual Studio Code on any laptop exactly how you want it. You won’t notice that you are coding on a different machine!

In this post, we will go over Visual Studio Code’s settings hierarchy, how to set up Visual Studio Code on any machine exactly to your liking with profiles, useful extensions for data engineering, and the caveats of unrestricted extensions.

By the end of this post, you will have set up Visual Studio code exactly how you like it and be able to share it with other data engineers. Let’s get started.

TL;DR If you want a setup for data engineering, follow this short video in your project directory:

2. Python environment setup

Before we set up Visual Studio code, we will install Python and use a virtual environment to keep things tidy.

# Install UV at https://docs.astral.sh/uv/getting-started/installation/#standalone-installer

# Select the Python version for your project directory
uv python install 3.13
# Create a project 
uv init my-data-pipelines
cd my-data-pipelines
# run a script
uv run main.py
# Running the script will set a virtual env at .venv
# install libraries
uv add polars

Libraries & their version are stored in the project.ml file.

[project]
name = "my-data-pipelines"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
    "polars>=1.22.0",
]

3. VSCode Primer

VSCode User Workspace Profile

Before we dig into setting up extensions, it’s helpful to understand key components for setting up visual code exactly to your preference:

  1. User & Workspace settings: Change how visual studio code works using its settings. You can define project-specific settings (aka workspace) and settings for your entire machine (aka user). Note your workspace settings will override your user settings.
  2. Extensions: Tools(paid & free) available for use via the visual studio code marketplace. Extensions add functionality.
  3. Profiles: You can define a list of extensions and settings into a profile that can be shared and used by anyone with Visual Studio code. Profiles let dev teams quickly have the same IDE experience. Here is the link to my Data Engineering Profile. Import this link as shown below.
  4. Snippets: Snippets are keyboard shortcuts that generate boilerplate code. Open the list of available snippets with Ctrl + Shift + p -> Snippets: Configure Snippets. Let’s look at an example to generate a try/except/else/finally block with a teef snippet.
  5. Devcontainers: Devcontainers enable you to develop in docker containers with the VSCode. Define your extensions, settings, profiles, etc in the devcontainer.

Devcontainers enable you to work directly on the files inside the docker container with the Visual Studio Code you are used to. Here is a sample devcontainer config that I use to install jupyter and python extensions and install requirements inside the container with pip install.

4. Extensions overview

While we have a lot of extensions, let’s look at the typical high-value ones:

1. Gitlens

Visualize git changes in VSCode. Gitlens

2. Python testing and debugging

Execute Python tests with the option to debug them. Python test & debug

3. ruff

Automatically clean up your code and format it. Ruff

4. SQL Tools

Connect to most databases and format SQL code. SQL Tools

5. Jupyter Notebook

Run jupyter notebook inside VSCode. Jupyter

6. Data Wrangler

Interactively transform your data and generate pandas transformation code. Data Wrangler

7. AutoDocString

Generate documentation for your class/function by typing """ under its definition. autoDocstring

8. Rainbow CSV

Sometimes, you want to inspect a csv without having to use cut or other such tools. Rainbow csv

9. DBT Power User

Run dbt commands via UI, render lineage, and docs inside Visual Studio Code. DBT power user

5. Privacy, Performance, and Cognitive Overload

I recommend understanding a tool in depth (read the docs/settings) to know how it works and use it for a few months before adding more. Unwanton addition of extensions can lead to cognitive overload.

Security of extensions, most extensions are not verified. Microsoft offloads the responsibility to the user with this prompt:

Security nightmare

Performance cost: Every extension is a typescript/Javascript app running in the background.

Performance concerns

In the above screenshot, code . represents VSCode. Note all the sub-processes (with different process ids PID) that get created and their memory usage!

6. Conclusion

VSCode is an excellent IDE, primarily due to its extensive list of extensions. If you are unhappy with your setup and feel it could be better, use this Data Engineering Profile.

If you use Neovim, check out my NeoVim config here.

What other extensions do you recommend? Please let me know in the comment section below.

Back to top

Land your dream Data Engineering job with my free book!

Build data engineering proficiency with my free book!

Are you looking to enter the field of data engineering? And are you

> Overwhelmed by all the concepts/jargon/frameworks of data engineering?

> Feeling lost because there is no clear roadmap for someone to quickly get up to speed with the essentials of data engineering?

Learning to be a data engineer can be a long and rough road, but it doesn't have to be!

Imagine knowing the fundamentals of data engineering that are crucial to any data team. You will be able to quickly pick up any new tool or framework.

Sign up for my free Data Engineering 101 Course. You will get

✅ Instant access to my Data Engineering 101 e-book, which covers SQL, Python, Docker, dbt, Airflow & Spark.

✅ Executable code to practice and exercises to test yourself.

✅ Weekly email for 4 weeks with the exercise solutions.

Join now and get started on your data engineering journey!

    Testimonials:

    I really appreciate you putting these detailed posts together for your readers, you explain things in such a detailed, simple manner that's well organized and easy to follow. I appreciate it so so much!
    I have learned a lot from the course which is much more practical.
    This course helped me build a project and actually land a data engineering job! Thank you.

    When you subscribe, you'll also get emails about data engineering concepts, development practices, career advice, and projects every 2 weeks (or so) to help you level up your data engineering skills. We respect your email privacy.

    Land your dream Data Engineering job with my free book!

    Build data engineering proficiency with my free book!

    Are you looking to enter the field of data engineering? And are you

    > Overwhelmed by all the concepts/jargon/frameworks of data engineering?

    > Feeling lost because there is no clear roadmap for someone to quickly get up to speed with the essentials of data engineering?

    Learning to be a data engineer can be a long and rough road, but it doesn't have to be!

    Imagine knowing the fundamentals of data engineering that are crucial to any data team. You will be able to quickly pick up any new tool or framework.

    Sign up for my free Data Engineering 101 Course. You will get

    ✅ Instant access to my Data Engineering 101 e-book, which covers SQL, Python, Docker, dbt, Airflow & Spark.

    ✅ Executable code to practice and exercises to test yourself.

    ✅ Weekly email for 4 weeks with the exercise solutions.

    Join now and get started on your data engineering journey!

      Testimonials:

      I really appreciate you putting these detailed posts together for your readers, you explain things in such a detailed, simple manner that's well organized and easy to follow. I appreciate it so so much!
      I have learned a lot from the course which is much more practical.
      This course helped me build a project and actually land a data engineering job! Thank you.

      When you subscribe, you'll also get emails about data engineering concepts, development practices, career advice, and projects every 2 weeks (or so) to help you level up your data engineering skills. We respect your email privacy.