Assume you're maintaining a project (an application or a library) and it needs some new dependency - a library, maybe a framework. How to add it to the project for real? You look around and see multiple alternatives:
- some people just put their direct dependencies (say,
requests) in a
requirements.txtfile and call it a day
- some also add a more or less specific version (e.g.
- others install them, call
pip freezeand put its entire result in there
- and finally: there are more advanced solutions like
So many options! Oh, and it seems like the choice depends on whether you're maintaining an application or a library, why is that?
Please note it is not a guide for any particular dependency management tool (although I will suggest a few) but rather a high-level overview of managing dependencies in a Python project - and why you would want to do it in a well-structured way. Also, there's a Tl;dr section near the end if you're not interested in the details.
Otherwise, I will try to explain why there's more than one way to do it and what are the tradeoffs between them so you can make an informed decision - no matter whether you're reasonably new to having dependencies in your project (and figuring out how to start) or you're already familiar with dependency management - but struggle with grasping the big picture at once.
But first let's go through some definitions:
version pinning- specifying an exact version of a dependency (e..g.
version restricting- specifying a range of versions of a dependency (e.g.
direct dependency- a dependency your code depends on directly; most likely it means an
importstatement in your code (e.g.
transitive dependency- a dependency of a dependency but not a direct one (say your app depends on
urllib3but your code doesn't directly use
urllib3; in this case
urllib3is a transitive dependency of your app)
I believe the
restricting wording is not very widespread but I intend to use it throughout this article to be concise.
If you think there's a better way to call them, feel free to correct me.
What does that mean, exactly?
This may feel obvious but actually listing our needs will make things down the road easier:
- listing the libraries, frameworks etc. your project depends on directly
- being able to easily upgrade/downgrade them - a specific one or all at once
- being able to drop a dependency (because your code ceases to use it) - along with transitive dependencies that are no longer necessary
- discerning the transitive dependencies from the direct ones
- installing them - for runtime, but also for running tests or some static analysis - either on a CI server or locally, and also somehow isolated from the system (e.g. in a virtual environment or a Docker image) or not
- installing them with some degree of reproducibility
I understand sometimes it's not as easy to classify your project as an application or a library - it might be a library but with a CLI for example - but let's start with this simple distinction and build on that.
But first, what's dive more into...
If you have some series of commands you execute to install your project along with its dependencies - be it for deployment, for running tests, development or something else - then reproducibility of that process means:
- all your dependencies (direct and transitive) are installed in the same versions each time
- or all dependencies are installed with the same code each time - this is different because a dependency may very well be released again under the same version but with a different code - possibly by another party with malicious intents; of course it doesn't happen often but point is: it may happen
To achieve a) you need a full list of your dependencies (including transitive ones) with their exact versions. b), in addition, requires some form of hashes/checksums of dependencies' code. Don't worry, there are good tools for that.
Why is reproducibility useful?
- the more aspects of that installation you control (e.g. commands, dependencies - direct or transitive), the less the chance of something changing without your intent - or awareness; what could cause that?
- a release of a new version of your project's dependency; upon running a build, will the new version be used without you knowing? will it introduce some obvious breaking change? will it change somthing that will manifest in a week rather than instantly?
- hijacking the repository of a dependency and a release it under the same version number as before but with some malicious code
- injecting some malicious code with the help of some man-in-the-middle-style attack when the dependency is being installed during a build
- it can be very helpful when reproducing a bug (or other behavior) - allowing us to precisely change only one aspect of our project (e.g. a specific part of our code, or change a version of only one particular dependency) while keeping all the rest intact
- without a reproducible tree of dependencies we can't fully control what dependencies (including transitive ones) and in what versions they get installed
- you can analyze/inspect the dependencies in the exact versions that would be used in production later
- e.g. by looking for security vulernatiblities with some tool (e.g. with
safety; note that
safetywill only check the exact pinned versions)
- or using any other static analysis tool
- e.g. by looking for security vulernatiblities with some tool (e.g. with
Be aware that depending on what your commands actually do, there are more things that can change: system libraries (including the Python interpreter used for runtime and installation), system kernel etc. but these are out of scope of this article. Let's just remember that reproducibility is not exactly a binary thing but rather a spectrum of choices.
On the other hand we may want
auto upgrades of the dependencies when we install them - so they get installed in their newest possible versions for your project (honoring the versions restrictions) without having to do anything besides re-running the installation (build); this of course takes reproducibility away.
At the end of the day you need to consider your situation and choose which option will work better for you. My personal default is reproducibility and more control, then looking whether the benefits of
auto upgrades outweight the costs of lost reproducibility.
Now, this all applies easily to application projects. What does
reproducibility even mean for maintaining libraries?
Reproducibility for libraries
Let's focus on maintaining libraries for a while. The terms
reproducibility is more nuanced here, because your dependencies get installed:
- locally or during some continuous integration (CI) process: for development, running tests and static analysis
- after distributing, in your users' installations (so your library becomes their dependency itself)
For Python libraries, you shouldn't ever pin your dependencies' versions for the second purpose (in an attempt to make it somehow
reproducible) - because it's very easy to lead your users into version conflicts (
dependency hell) that way.
You may consider reproducibility for CI and even development. That's a double-edged sword, though:
- with reproducible builds no unexpected breaking change in your dependencies would cause you trouble in CI or for local development - because you have a full tree of dependencies that you know is working
- ...but that also means you won't be notified by your CI about such a breaking change so you won't know you need to act (by restricting a version of a dependency, or otherwise adjusting your library to that change)
I don't know of any good solution to that: either you make your process block changes in your library until you fix such a problem or go with reproducibility and have a process of regurarly updating the frozen versions (like Dependabot) - the more frequent the updates, the quicker you'd catch any breakages.
Let's discuss this aspect of dependency management a bit more: why do we sometimes restrict the version of a dependency to some
There are a few possible reasons and most of them (if not all) are really situational and depend on a particular project (e.g. its versioning scheme and support guarantees*):
foo>=1.2- e.g. to be sure
foogets installed in the version
1.2or higher because your code depends on some feature or behavior that was introduced in
bar!=1.2.3- e.g. because
bar-1.2.3introduced some bug that you are sure will get fixed in
baz<2- e.g. when you're fairly sure
baz-2.0will introduce some changes that will break your project and you don't want to get surprised by that release
baz>1.2,<2- you can mix a) and c) together
As new versions of your dependencies get released (or when your code using them changes) the related restrictions will likely need changing, too.
Side-note: it's a good idea to document each restriction (e.g. in commit messages) so you'll know why is it there in the first place and will be able to upgrade them with more confidence. Also, to avoid cargo-culting: imagine you're inheriting a
requirements.txt file or copyting some part of a project to the next one, then see
flask<1.2, then ask yourself: is the
<1.2 part still required? A little piece of documentation (especially if it's quickly accessible through
git annotate) would definitely help.
* no versioning scheme will guarantee shielding from all possible breaking changes; it may only reduce the probability
There are two forms of dependency specifications that I often encounter: keeping the dependencies in a single file; they do their job to some extent but each is lacking:
1. Direct dependencies only
A list of (only) direct dependencies:
or, with versions restricted for any reason:
Such a list is often put:
- in a
extras_requiresections in your
So this form:
2. Frozen list only
Pretty much a result of a
pip freeze command ran in an environment with all (hopefully!) the necessary dependencies installed - a complete, flattened tree of dependencies (direct and transitive) with their exact versions:
certifi==2020.12.5 chardet==4.0.0 click==7.1.2 Flask==1.1.2 idna==2.10 itsdangerous==1.1.0 Jinja2==2.11.3 MarkupSafe==1.1.1 requests==2.25.1 urllib3==1.26.4 Werkzeug==1.0.1
Such a list is often put... also into a
requirements.txt file (sometimes named differently, e.g.
requirements-frozen.txt - the exact name doesn't really matter).
It'd be preferred to also have checksums of the source code of each version (for even more reproducibility and security assurances) but
pip freeze doesn't provide that.
Assuming such a frozen list really represents a complete dependency tree:
- it's good for installing them (E) - with reproducibility (F)
- but your direct dependencies (A) are now specified in the same bag as the transitive ones, so you can't easily discern the two (D)
- maintenance (easy up-/downgrading (B) and dropping (C)) is often clunky, to put it mildly
- especially dropping a dependency - how do you know which transitive dependencies are still required when you can't tell which ones are transitive in the first place? cleaning this up would either require some manual tracking of transitive dependencies or... yes, exactly! recreating the list of direct dependencies and going from there
There's a tool that can help a little bit with these problems -
pipdeptree - but it'd still require brittle, manual work.
Some projects - likely in the absence of a clear dependency management system - try to mix these two together:
flask>=1.1 pytest<5 requests-2.22 urllib3>1.26
Imagine a possible history of this file:
flask>=1.1- because everyone says we need 1.1
pytest<5was copied from the previous project - but I don't know why it was there
requests-2.22is there because somewhere around May 2019 Bob said we need to upgrade to 2.22 - and nobody touched that ever since
urllib3>1.26? Alice said there was some security issue with 1.26 - but we don't even import urllib3 in our code, does it really have to be here? Nobody is able to say for sure
There are good explanations for why we may end up in such a situation and I see it may be bearable to manage dependencies like this - but it doesn't make it any better: it's less structured which makes it harder to read, reason about and update. It can also lead to dependency hell more easily. Eventually it becomes a place where - if any changes are necessary - you make random modifications in frustration and mumble
please just work already! (I'm calling this
To sum up
Now we see that these options, while seemingly do the same job, they really serve different purposes. When a project resorts to using one over the other (or, even worse, cram them together), it surrenders a few aspects of full-fledged dependencies management (knowingly or not).
Which leads to the realization: both of these forms are useful and you would likely want to use both - which results in dependencies management system that fits all the needs we discussed earlier, along with a clear way to make your builds reproducibile (if you need it).
But maintaining them manually feels like a burden. - you'd say. And it is! That's why the frozen list's maintenance should be automated using the best tools you can use, e.g.
pip-tools. They were designed to do exactly this:
- you maintain a list of direct dependencies (in a
- and the tool helps you with maintaining the frozen list (
There are probably more options available. But if you're certain no ready solution suits your needs, you can go with some custom scripts - but please please review the existing tools first. :)
Each of the official how-to guides will explain their usage better than this (already long) article would do - so let me just point you towards their documentation:
If you're 100% sure you want to go more low-level with:
- manage the frozen list (e.g. as a
pip install .and
pip freeze | grep --invert-match 'YOUR_PACKAGE_NAME==' > requirements.txt
- as above, but with two plain files; you can call them e.g.
- manage the frozen list using
pip install -r requirements.inand then
pip freeze > requirements.txt
- as above, but with two plain files; you can call them e.g.
If you want to manage dependencies for:
- a library (for applications or other libraries to use):
- an application:
If you have any "why?" questions about these - I hope the article above answers them well enough. :)
当前网页内容, 由 大妈 ZoomQuiet 使用工具: ScrapBook :: Firefox Extension 人工从互联网中收集并分享;
订阅 substack 体验古早写作:
点击注册~> 获得 100$ 体验券:
关于 ~ DebugUself with DAMA ;-)
公安备案号: 44049002000656 ...::