“Data Scientists neglect code quality”. While there is a bit of truth, we should always care about quality when we deal with code. This includes using design patterns for clean code and proper formatting or type hinting.
There are already tons of tools for code quality like ruff
, bandit
or mypy
. Even if we installed those, nobody forced us to run all those tools on our codebase.
With pre-commit
, you can run all the checks and formatting tools you want to make sure your code follows your development guidelines.
pre-commit
hooks are part of Git hooks. Git hooks are automated scripts triggered by specific events within a Git repository.
So, pre-commit
hooks run before you commit changes to your repository. You define the scripts that should run in a YAML file where all the steps run against your changes or codebase.
In this article, you will learn how to set up pre-commit
for your project with the most important hooks.
This post is sponsored by snappify. If you want to create high quality code snippets, this is a no-brainer. I grew my LinkedIn community from 2k to 16k+ followers with snappify. For a yearly subscription, you will get 5 % off with the code “BANIAS5“. Only valid for the first 15 ones.
Set-Up Git Repository
To use pre-commit
, we need a Git repository. Just create a new folder, go into it, and initialize Git.
$ mkdir testproject
$ cd testproject
$ git init
Initialized empty Git repository in C:/Code/testproject/.git/
Set-Up pre-commit
First, you need to install pre-commit
via pip and into your Git hooks:
$ pip install pre-commit
$ pre-commit install
pre-commit
needs a set of hooks to run against your code. For this, you need a YAML file named .pre-commit-config.yaml
. Let’s look at a simple sample config file for demonstration purposes and go through the lines.
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.2.0
hooks:
- id: check-merge-conflict
- id: end-of-file-fixer
- id: check-added-large-files
- id: double-quote-string-fixer
check-merge-conflict
: Checks for files that contain merge conflict strings.
end-of-file-fixer
: Makes sure files end in a new line.
check-added-large-files
: Prevent huge files from being committed.
double-quote-string-fixer
: Replaces double-quoted strings with single-quoted strings.
Let’s see it in action. Create a Python file (in my case, example.py
) and paste in the following lines:
def hello_world():
print("Hello world!")
Add + commit it:
$ git add .
$ git commit -m "Initial commit"
Now, pre-commit
is triggered and will run all the defined hooks. It should look something like this:
Check for merge conflicts................................................Passed
Fix End of Files.........................................................Passed
Check for added large files..............................................Passed
Fix double quoted strings................................................Failed
- hook id: double-quote-string-fixer
- exit code: 1
- files were modified by this hook
Fixing strings in example.py
It ran all of our hooks automatically and even fixed the double quotation to a single quotation.
Easy!
Note: Since it fixed the double quotations, you have to add + commit again, since
pre-commit
changed your file.
Advanced pre-commit
Like I said earlier - you can run popular code-checking and formatting tools like ruff
, but there are hundreds of different pre-defined hooks you can use. A huge list of hooks is here, but I will show you the most useful ones.
Here is the YAML file I usually use for projects:
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: mixed-line-ending
- id: check-added-large-files
args: ["--maxkb=1000"]
- id: end-of-file-fixer
- id: requirements-txt-fixer
- id: check-yaml
- id: check-json
- id: check-toml
- id: pretty-format-json
args: ["--autofix"]
- id: check-merge-conflict
- id: check-case-conflict
- id: check-docstring-first
- repo: https://github.com/Lucas-C/pre-commit-hooks-bandit
rev: v1.0.6
hooks:
- id: python-bandit-vulnerability-check
- repo: https://github.com/pre-commit/mirrors-mypy
rev: "v1.6.0"
hooks:
- id: mypy
- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: "v0.0.292"
hooks:
- id: ruff-format
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
- repo: https://github.com/econchick/interrogate
rev: 1.5.0
hooks:
- id: interrogate
args: [--quiet, -i]
- repo: https://github.com/PyCQA/docformatter
rev: v1.7.5
hooks:
- id: docformatter
additional_dependencies: [tomli]
args: [--in-place]
The first section contains some general hooks you should consider for a project. E.g.
check-yaml
checks for valid YAML files.bandit
: Checks your code against known security vulnerabilities (e.g. requests without an SSL certificate).mypy
: Static typing checker, so if you added your type hints,mypy
checks if there are variables with wrong types passed to functions or classes.ruff
: A linter + code formatter on steroids. Orders of magnitudes faster than other tools and also does automatic import statements sorting.interrogate
: It’s like pytest-coverage, but for docstrings. It checks how many functions and classes have docstrings.-i
means it should not check__init__.py
files, and— quiet
means it should not output anything. By default, your docstring coverage should be at 80%.docformatter
: Auto-formats your docstrings.—in-place
means it makes changes to your file directly.
If you make some changes to your Python file and add + commit again, you will see something like this:
check for added large files..............................................Passed
fix end of files.........................................................Passed
fix requirements.txt.................................(no files to check)Skipped
check yaml...............................................................Passed
check json...........................................(no files to check)Skipped
check toml...........................................(no files to check)Skipped
pretty format json...................................(no files to check)Skipped
check for merge conflicts................................................Passed
check for case conflicts.................................................Passed
check docstring is first.................................................Passed
bandit...................................................................Passed
mypy.....................................................................Passed
ruff.....................................................................Passed
black....................................................................Passed
interrogate..............................................................Failed
- hook id: interrogate
- exit code: 1
docformatter.............................................................Passed
We see, that interrogate
failed. Because you should have 80% coverage by default for docstrings. That means, your changes weren’t committed, because pre-commit
didn’t allow us. You can check that if you run git status
.
pre-commit
hinders us from pushing code without docstrings because we defined it like so in our YAML file. Nice.
Let’s add our docstrings there.
"""A module to print Hello World!."""
def hello_world():
"""A function to print Hello World!."""
print("Hello World!")
Add + commit everything again, and see what pre-commit
does:
trim trailing whitespace.................................................Passed
mixed line ending........................................................Passed
check for added large files..............................................Passed
fix end of files.........................................................Passed
fix requirements.txt.................................(no files to check)Skipped
check yaml...............................................................Passed
check json...........................................(no files to check)Skipped
check toml...........................................(no files to check)Skipped
pretty format json...................................(no files to check)Skipped
check for merge conflicts................................................Passed
check for case conflicts.................................................Passed
check docstring is first.................................................Passed
bandit...................................................................Passed
mypy.....................................................................Passed
ruff.....................................................................Passed
black....................................................................Passed
interrogate..............................................................Passed
docformatter.............................................................Passed
Everything passed. And finally, our changes were committed (verify again with git status
).
Bonus Tip 1: Auto-Update pre-commit Hooks
You saw in our YAML file that we have to define versions for the hooks. But you can automatically update them via the following command:
$ pre-commit autoupdate
[https://github.com/pre-commit/pre-commit-hooks] already up to date!
[https://github.com/Lucas-C/pre-commit-hooks-bandit] already up to date!
[https://github.com/pre-commit/mirrors-mypy] updating v1.6.0 -> v1.9.0
[https://github.com/charliermarsh/ruff-pre-commit] updating v0.0.292 -> v0.3.2
[https://github.com/econchick/interrogate] already up to date!
[https://github.com/PyCQA/docformatter] already up to date!
Everything up-to-date now :)
Bonus Tip 2: Disable pre-commit
A sneaky way to overcome running pre-commit is by adding — no-verify
when committing:
$ git commit -m "Commit message" --no-verify
Not recommended. Use at your own risk!
Bonus Tip 3: Run pre-commit without Git repository
If you don’t want to set up a Git repository, you can also run pre-commit like so:
$ pre-commit run --all-files
Conclusion
In this article, you learned how easy it is to make the (first) step towards clean(-er) code. There are hundreds of existing hooks you can use. But like with everything, too many hooks are not good and may slow down the pre-commit-hook runs. While pre-commits are nice-to-have, your code will not go from 0 to 100. Design Patterns, SOLID, DRY, etc. are more important to focus on.