Good software engineering practices always bring a lot of long-term benefits. For example, writing unit tests permits you to maintain large codebases and ensures that a specific piece of your code behaves as expected. Writing consistent Git commits also enhance the collaboration between the project stakeholders. Well-crafted Git commit messages open the door to automatic versioning and generated change log files. Consequently, a lot of attempts are currently ongoing and applied to normalize the messages written in our Git commits.
In the first part of this serie, we setup, our project by installing different Python versions with pyenv
, setting a local version of Python with pyenv
, encapsulating it into a virtual environment with poetry
. Here we show more precisely how to unit test your Python application and how to enforce and validate your Git commit messages. The source code associated with this article is published on GitHub.
Testing our code
The project is a simple python function that summarizes data present in a pandas DataFrame. The function outputs the number of rows and columns and the frequency of each data types present in the pandas DataFrame:
1 | ---- Data Summary ------ |
Go to your project root directory and activate your virtual environment:
1 | poetry shell |
We add a couple of dependencies using poetry:
1 | poetry add -D pynvim numpy pandas |
The -D
flag indicates that the dependency only apply to development environments.
Note: I personally use NeoVim for coding that is why I need the
pynvim
package to support NeoVim python plugins.
Based on the expected output defined above, our program is made of three steps:
- Getting the shape of the pandas DataFrame.
- Getting the pandas
dtypes
frequency. - Concatenating the two results into a unified DataFrame that we will use to output the final result.
Once the final DataFrame is obtained we output the result as depicted above. In this regard our code scaffold could look as the following:
1 | import pandas as pd |
Let’s now start writing our unit tests. We are going to use the unittest
tool available with the Python standard library. You may remember in the previous article that pytest was defined as a developer dependency for testing. It is not an issue with pytest
because it natively runs tests written with the unittest
library.
Unit tests are single methods that unittest
expects you to write inside Python classes. Choose a descriptive name for your test classes and methods. The name of your test methods should start with test_
. Additionally, unittest
uses a series of special assertion methods inherited from the unittest.TestCase
class. In practice, a test should precisely cover one feature, be autonomous without requiring external cues, and should recreate the conditions of their success.
To recreate the necessary environment, setup code must be written. If this code happens to be redundant, implements a setUp()
method, that will be executed before every single test. This is pretty convenient to re-use and re-organize your code. Depending on your use case you may have to perform systematic operations after the tests ran. For that, you may use the tearDown()
method.
First you can read below the unit test we implemented for the data_summary()
function:
1 | import unittest |
The setUp()
method initializes two distinct pandas DataFrame. self.exp_df
is the resulting DataFrame we expect to get after calling the data_summary()
function and self.df
is the one used to test our functions. At the moment, tests are expected to fail. The logic has not been implemented. To test with poetry
use the command:
1 | poetry run pytest -v |
Using the -v
flag returns a more verbose output for your test results. You can see that your tests are labeled according to the classes and functions names you gave (i.e., <test_module.py>::<class>::<test_method>
).
The code is updated to conform with the unit tests:
1 | import pandas as pd |
Run our test again:
1 | poetry run pytest -v |
One last thing here. In our tests, we did not test the actual output. Our module is designed to output a string representation of our DataFrame summary. There are solutions to achieve this goal with unittest
. However we are going to use pytest
for this test. Surprising isn’t it? As said before pytest
interpolates very well with unittest
and we are going to illustrate it now. Here the code for this test:
1 | import unittest |
Notice the decorator @pytest.fixture(autouse=True)
and the function it encapsulates (_pass_fixture
). In the unit test terminology, this method is called a fixture. Fixtures are functions (or methods if you use an OOP approach), which will run before each test to which it is applied. Fixtures are used to feed some data to the tests. They fill the same objective as the setUp()
method we used before. Here we are using a predefined fixture called capsys
to capture the standard output (stdout
) and reuse it in our test. We can then modify our code display_summary()
accordingly:
1 | import pandas as pd |
Then run the tests again:
1 | poetry run pytest -v |
The tests now succeed. It is time to commit and share our work, for example by publishing it to GitHub. Before that, let’s take a close look at how to properly communicate about our work with Git commit messages while respecting and enforcing a common standard.
Enforce Git commit messages rules in your Python project
Writing optimal Git commit messages is not an easy task. Messages need to be clear, readable, and understandable in the long term. The Conventional Commits specification proposes a set of rules for creating explicit commit histories.
Using commitizen
In our series about JavaScript monorepos, we saw how to integrate these conventions to enforce good practices regarding commit messages. Applied to Python, we are going to use a package called commitizen to achieve this. Let’s add this package to our developer dependencies:
1 | poetry add -D commitizen |
To setup commitizen
for your project, run the command cz init
. It prompts us with a set of questions:
1 | cz init |
Choose all default choices here as they fit perfectly with our actual situation. The last question asks us if we want to use pre-commit hook. We are going to come back to this later on. So just answer no
for now. If we look at our pyproject.toml
file we can see that a new entry named [tool.commitizen]
has been added:
1 | [...] |
To check your commit message, you can use the following command:
1 | cz check -m "all summarize_data tests now succeed" |
Our message is rejected because it does not respect the commit rules. The last line suggests some patterns to use. Take some time to read the conventional commits documentation and run the command cz info
to print a short documentation:
1 | cz info |
This command guides you on how to write your commit message. Here the format should be "[pattern]: [MESSAGE]"
. For us, this leads to:
1 | cz check -m "test: all summarize_data tests now succeed" |
Very good, our commit message is valid. But hold on. Checking our messages each time with commitizen
might be cumbersome and doesn’t provide the garanty to be applied. It would be better to check automatically the message each time we use the git commit
command. That is where the pre-commit
hook takes action.
Automatically enforce Git message conventions with pre-commit
Git hooks are useful to automate and perform some actions at specific place during the Git lifecycle. The pre-commit
hook permits to run scripts before a Git commit is issued. We can use the hook to validate the commit messages and prevent Git from using a message which doesn’t match our expectations. The hook is active from the command line as well as from any tools interacting with the Git repository where the hook is registered, including your favoride IDE.
pre-commit is a framework for managing and maintaining multi-language pre-commit hooks. If you want to know more about the inner workings and the spectrum of possibilities opened by the pre-commit
hook, you can read its usage documentation.
To install pre-commit
just run:
1 | peotry add -D pre-commit |
To automate the Git commit verification we first need to create a configuration file .pre-commit-config.yaml
as followed:
1 |
|
Next we can install the hook with its source defined in the repo
property:
1 | pre-commit install --hook-type commit-msg |
Now that everything is set, we can use our Git hook:
1 | git commit -m "test: all summarize_data tests now succeed" |
pre-commit
installs an environment to run its checks. As you can see here the commit message assessment passed. To finish we can commit and push the modifications made on the build files (poetry.lock
, pyproject.toml
) and our module:
1 | git commit -m "build: add developer dependencies" -m "commitizen and pre-commit added to our dev dependencies" |
We can now push everything to our GitHub repository:
1 | git push origin master |
Conclusion
We covered a few topics:
- On the first hand, we saw how to write unit tests for your code. You shall always start to write tests before coding. It helps you affinate your API and expectations before implementing them. You will definitively benefit from it. We used
unittest
which is already available in the Python standard library. I actually like its simple design and object-oriented approach but others prefer using thepytest
library which is definitively worth checking. One very convenient aspect is thatpytest
supports theunittest.TestCase
class from the beginning. You can then write your tests with either of the two libraries or even mix both depending on your needs and have one common command to run them all. - We saw how to enforce good practices when writing Git commit messages. Our proposed solution relies on the use of two distinct Python packages: commitizen and pre-commit. The first one provides with the tools to check if a message validate the conventions you have chosen. The second one automates the process using a Git hook.
In our next and last article, we are going to go one step further. We automate testing using tox
and integrate it inside a CI/CD pipeline. Once done we will show how to prepare our package and finally publish it on PyPi using poetry
.
Cheat sheet
poetry
Add project dependencies:
1
poetry add [package_name]
Add developer dependencies:
1
poetry add -D [package_name]
1
poetry add --dev [package_name]
Run test:
1
poetry run pytest
commitizen
Initialize
commitizen
:1
cz init
Check your commit:
1
cz check -m "YOUR MESSAGE"
pre-commit
Generate a default configuration file:
1
pre-commit sample-config
Install git hook:
1
pre-commit install --hook-type [hook_name]
Acknowledgments
This article was first published in Adaltas blog and kindly reviewed by the CEO David Worms and one consultant Barthelemy NGOM.