Source Code for this post can be found here.
Introduction
As a Data Scientist, testing your code isn't maybe your strength. But it helps you in making your code more reliable and safe during code changes.
One thing some developers love is to have high code coverage.
Code coverage is a metric measuring whether a line of code was executed or not during tests.
A big problem with code coverage is that it doesn't measure the quality of tests.
So, you could have 100 % coverage without writing any high-quality tests.
Suppose we have the following revolutionary function where we add two numbers:
# src/addition.py
def add_numbers(a, b):
return a + b
Now, based on our superior domain knowledge, we write a test to see if our function behaves as expected
# tests/test_addition.py
from src.addition import add_numbers
def test_add_numbers():
assert add_numbers(1, 0) == 1
This has a 100 % code coverage, crazy. 🤯
Feature done. ✅
Now imagine: A few weeks later, a new developer joins your team and has to implement new features for the application.
By accident, he changed the + sign of the add_numbers()
function to a -. 😢
He runs all the tests, which are passing, and calculates the code coverage too, which is still 100 % for add_numbers()
.
Do you see the mistake?
Our small unit test for add_numbers()
wasn't strong enough.
1 + 0 is 1. But, after the change of the new developer, 1 - 0 is still 1 and the test still passes.
How can we automatically run these subtle changes to our codebase to check if our tests still pass or fail?
Enter mutation testing.
What is Mutation Testing?
Mutation testing does what we want - It makes small changes (also called mutants) to your codebase and runs your tests against it.
If a mutant survives (= all of your tests passed), this is a sign to revisit your tests. If at least one test fails, your test covers the desired behavior.
Mutation Testing tests your unit tests.
How do you do automated mutation testing in Python?
mutmut is a great Python library for mutation testing.
Install it with pip install mutmut
and use it for our example above.
Our project structure is as follows (we didn’t change the source code from our example above):
├── src
│ └── addition.py
└── tests
└── test_addition.py
Let's run mutmut
with the following command:
$ mutmut run --paths-to-mutate "src/" --tests-dir "tests/"
And let’s take a look at the output:
1. Running tests without mutations
⠼ Running...Done
2. Checking mutants
⠋ 1/1 🎉 0 ⏰ 0 🤔 0 🙁 1 🔇 0
This is what the output means:
🎉 Killed mutants. The goal is for everything to end up in this bucket.
⏰ Timeout. Test suite took 10 times as long as the baseline so were killed.
🤔 Suspicious. Tests took a long time, but not long enough to be fatal.
🙁 Survived. This means your tests need to be expanded.
🔇 Skipped. Skipped.
One mutant is in 🙁, which is not desired.
If we want to see in which files the mutants survived, run mutmut show
which gives you the file name and the corresponding id of the mutant.
$ mutmut show
Survived 🙁 (1)
---- src/addition.py (1) ----
We only have one script so we know where the surviving mutant lives.
To see what the mutant looks like run mutant show <id>
(in our case, the id is 1).
$ mutmut show 1
--- src/addition.py
+++ src/addition.py
@@ -1,2 +1,2 @@
def add_numbers(a, b):
- return a + b
+ return a - b
Aha! We see that mutmut
changed the + to a -, like we did earlier.
Let's add another assert to catch this behavior:
# tests/test_addition.py
from src.addition import add_numbers
def test_add_numbers():
assert add_numbers(1, 0) == 1
assert add_numbers(3, -1) == 2
And run mutmut
again:
$ mutmut run --paths-to-mutate "src/" --tests-dir "tests/"
1. Running tests without mutations
⠇ Running...Done
2. Checking mutants
⠋ 1/1 🎉 1 ⏰ 0 🤔 0 🙁 0 🔇 0
We eliminated the mutant with success!
If you still have a surviving mutant, you should delete your
.mutmut-cache
file, whichmutmut
automatically creates.
Conclusion
We learned how unreliable and misleading code coverage is, and how mutation testing helps us to improve the quality of our tests.
One downside of mutation testing is its long-running process, especially for larger code bases. But it's still a nice tool to have under your belt.