Dealing with Dependencies in Rust
At Tweede golf we are convinced that if software is written in Rust, it will be more robust (compared to legacy languages such as C, C++ or Java), and more efficient (compared to code written in PHP or Python and again, Java).
In order to get more robust software out there, we have to get Rust code running on computers of people who are not themselves Rust developers.
In fact, regular users should not even need to know about Rust or the
cargo tool, but still enjoy its benefits by running Rust programs.
To get Rust into the hands of the general public, means we need to have a way of distributing it. In the Linux world, this means Rust software needs to be packaged by Linux distributions such as Debian and Ubuntu; which is what recently happened for our sudo and ntpd projects. A challenge that is sometimes overlooked by Rust developers is that of dependencies. Overlooked, because they are so easily found on crates.io, documented in a uniform way on docs.rs, and the standard build tool for Rust,
cargo, makes it so easy to manage them--what's the problem?
But of course, in order to package our software for a Linux distribution, it is also necessary that all its dependencies are packaged and distributed, using the distribution's own time-tested, highly trusted and robust procedures for distributing software.
And that's when dependencies become something to pay more attention to.
At Tweede golf, while thinking about dependency evaluation, I hacked together a small experimental tool (about 300 lines of Rust code) that checks the health of dependencies of a Rust project by listing several somewhat arbitrary vitality indicators such as its overall age, recent downloads, and average number of releases in the last 90 days. The idea is to have an easy check to see if we are using any dependencies that are not very well maintained or unpopular (and so, perhaps, potentially more problematic).
This was quite easy, and this tool required just a few dependencies:
- cargo_metadata, for easily accessing Rust project information contained in
- crates_io_api, to safely interact with the experimental https://crates.io API.
- octocrab, to interact with the GitHub API through a Rust interface
- chrono, for working with times and timestamps
- tokio, for its async runtime--needed since Octocrab only offers an asynchronous API
- async-trait which to be honest, I probably didn't need
Most of these, I hope, would seem reasonable to a Rust programmer: there is no need to reinvent the wheel or spend hours figuring out GitHub's REST API ourselves. And Tokio is such a very widely used crate as well, you might be forgiven for thinking it is somehow "part of standard Rust".
cargo tree command reveals that adding these dependencies in turn, pulls in a total of 141 dependencies. Some of these are even duplicates. For example, crates such as base64, socket2, syn and time get included in two different, incompatible major versions, which can raise some eyebrows. And if I run
cargo update (as I should often do, as a well-meaning Rust developer), the situation becomes worse (at the time of writing) because then I get two versions of ring and several of its dependencies.
This is not entirely a Rust-specific situation, however. As Simon Heath demonstrates, a typical C program also pulls in many dependencies (although it would be strange to have multiple conflicting versions of a dependency in a C program).
Why having many dependencies is a bad thing
So why do some people care at all about dependencies? We can view this through the helpful lens of the burden problem and the trust problem.
The burden problem
The burden problem simply means: every dependency that needs to be distributed, will require a certain amount of effort. It is simply easier to ship Prossimo's Rust implementation of sudo (which only has 4 dependencies, which are probably already in every Linux distribution) than Project Pendulum's ntpd implementation (which has more). On top of that, unlike
cargo, Debian's package management doesn't handle multiple versions of the same dependency very well. So Debian maintainers have to perform a careful balancing act to package the most compatible versions of common dependencies. To this effect, they will also often make changes to
Cargo.lock and even
Cargo.toml files. Perhaps even undoing some of the choices that the original Rust developers made.
The situation is even more complex considering there is no clear guideline on how to manage versions. Some crates very specifically pin themselves to versions in their
Cargo.toml file, while others may specify a version in their
Cargo.toml file that they are actually no longer compatible with, but are rescued by the fact that
cargo update put a much more recent version in their
The trust problem
The trust problem on the other hand, is concerned with the amount of code that is pulled in. If I write a small program of 300 lines, and add 141 depedencies (as above), how much of the code in the resulting binary is actually something that I as a developer can vouch for? What if a security problem arises in any of those 141 dependencies, will it be fixed in time by their maintainers? Will someone hypothetically distributing this tool be able to compile my program against the new updated version?
And we could be even more paranoid: what if the crates.io account of one of the maintainers of one of those 141 crates is compromised, and an attacker manages to sneak in some devious code that my program then executes while it is being built?
All of these problems become much less worrisome if you have fewer dependencies. And indeed, the sudo core team decided to be radical in their choice of dependencies for this reason, limiting the dependencies to just
libc (which every Rust project implicitly depends on anyway) and
log (maintained by the Rust project) and
Why having many dependencies is a good thing
But of course, such a radical approach comes at a cost: you simply cannot have certain features. For ntpd, a similar effort was made, but reducing that project to just three dependencies would be nigh impossible.
Another cost is that developers will start re-inventing solutions, or (perhaps better) copying the code they need from dependencies. But that in fact just adds a "hidden dependency". How likely is that cut-and-pasted code to be updated if the original source fixes a bug in it? Isn't it better to just find a popular dependency that has the functionality your project needs, and use it?
A bigger standard library?
"Ah!", I hear voices say, "but we also need so many dependencies because the Rust standard library is small; why don't we build a bigger extended standard library?". While it is true that the standard library sometimes has surprising omissions, those features are typically contained in popular crates. Really popular crates that fill such gaps occasionally even get incorporated into the standard library after they have proven their worth (such as what happened to once_cell). So what are we missing?
A big standard library has the downside that it will lock in "mistakes". It took the C++ standard library quite some years to get rid of
auto_ptr, and the dubious
vector<bool> is still part of it after 25 years. The Go standard library has a vulnerability in its XML handling that has been known since December 2020. It's hard to get rid of these mistakes, since it can only be done by breaking some existing code. Making matters worse, a vulnerability in a large, widely-used library will affect many more applications. If the mistake is just in a small dependency, its impact is smaller, and it's much easier to identify the projects that might break when the mistake is removed.
A big standard library will still need to be maintained by many different authors, each responsible for one particular part. But how many authors, and which parts? It's easy to see for small dependencies, since that information is explicit on crates.io, but that isn't the case for large dependencies.
In summary, if developers say that "many dependencies are bad", they are implying that "large dependencies are good", and we at Tweede golf don't subscribe to that point of view.
It's all about trust
What matters most in this story is, of course, that the Rust ecosystem is made up of ... people. So what is hidden behind all these seemingly technical challenges is trust in people. And people can be hard to trust: it essentially means you surrender some control over your software to an outside party. On top of that, trust cannot be demanded, it has to be earned. So, it is inherently tied up with unsatisfying things like reputation, authority and using results from the past to predict events in the future.
Try this thought experiment: you are working on a project for a privacy-sensitive component for an important customer. You need a highly specific system interface not part of the standard library and you find a crate that supplies this, maintained by a single person; last update was 400 days ago; and it was downloaded by 5000 other people. Would you add this as a dependency to your project, or would you rather copy the pertinent parts of that crate into your own code base? How confident are you that a vulnerability in this crate will be handled in a timely and responsible manner?
Yet another kind of trust is that in the
cargo supply chain itself. When I run
cargo build for the first time, I see a lot of packages being downloaded from the internet somewhere. All of these can contain a
build.rs file that is executed shortly after. This means I assume that the SSL certificate for crates.io hasn't been compromised (even though this is not impossible); the login passwords of the maintainers of the crate I am downloading have not been compromised; and obviously, that the authors did not hide any secret backdoors anywhere. Or, at the very least, I assume that whenever my faith in all of this turns out to be unfounded, I will "know about it" since news of it will be doing the rounds on the usual forums. And I make these assumptions every time I use a dependency.
What is the current situation? Rust developers are likely to respond by saying "that's a feature, not a bug" to some of the observations above. And they are actually right.
On the Linux distribution side, we can look at what Debian does. They have an overview of which crates are packaged by Debian. At the time of writing, this is more than 2000 crates. All of these will be signed off by a Debian maintainer, and represent a high degree of trust. On the other hand, they will lag a bit behind the bleeding edge version available on crates.io. To cope with the versioning issue, Debian maintainers often make small changes to the crates they package. For example, in the package for
pem 1.0.2, they introduced a patch to bump the depended-on version of base64 to the one they already packaged (even though it is technically an incompatible version). And you can find many, many examples of this. In some cases, they can't get away with this and they will actually have to package two version of a crate using a different name (for example, both
clap 2.x and
clap 3.x are available in Debian).
In short, the Debian maintainers have to do most of the work to smooth over the differences. And they are only a handful; but there are many many crates. Perhaps it is time that for some important packages, we make their life a bit easier?
So, if we disallow saying that "nothing is wrong", and we also don't like to introduce a new big everything-but-the-kitchen-sink standard Rust library, what is the solution?
The answer is, of course, that both Rust developers and maintainers of crates have to, at the very least, acknowledge these problems. Some concrete examples:
When making libraries, take care that you use APIs that don't change frequently. Also, when introducing a new API that eventually replaces an older API, bump your major version twice: once when you introduce the new API and once when you remove the older API. This avoids crates appearing compatible with the older API even if they are not.
Check regularly whether your crate still compiles with the minimum supported versions of the dependencies you list in your
Cargo.toml. We like to call this the minimum version check, which can be performed like this:
cargo +nightly -Zdirect-minimal-versions update && cargo build
(Do take care that a background process such as
rust-analyzer doesn't interfere with this!)
Use tools such as
cargo vetto get some feel for how secure your dependencies are, and how many eyeballs have looked at them.
If your library crate might conceivably be useful for inclusion in a Linux distribution (and they often are!), use
cargo tree -dto see if you have duplicated versions and try to eliminate those, and tools such as cargo debstatus to check how easily your crate can be packaged for Debian. For example, in the (partial) screenshot below, we can see that for
cargo pulseto be packaged,
cargo_metadataalso needs to be included in Debian, but that would be possible, since all of its dependencies are already there.
If your library crate has a CVE, make sure this advisory gets picked up by rustsec, so that it becomes visible in
cargo audit. If we look at the current list of advisories, we are pretty sure it's not entirely complete.
As a community, it would also be great to rally behind a set of crates that are universally considered to be "good to use" such as tokio. This can help newcomers to the language too. An attempt to make this list can be seen at https://blessed.rs, but of course, that website just represents the opinions of its authors. Perhaps we should all try to contribute to it?
How we dealt with this at Tweede golf for our sudo and ntpd projects was to be highly selective of dependencies. None of these projects have duplicated dependencies in them, and we check in our CI that they compile with the minimal version of their dependencies specified in their
Cargo.toml files. In both cases, there were also strong security arguments against being promiscuous with dependencies.
Let's talk more
Of course, the issues discussed above have not been completely tackled in this single blog post. At Tweede golf we will continue to think about them and contribute towards a solution. For example, as I am writing this, two of my colleagues are en route to the Tectonics event to exchange views on these and other matters, with the aim of finding ways to give Rust adoption in critical infrastructure an extra push.
We are really curious to hear how you think about the issues with dependencies. Perhaps you disagree that there even is a challenge, perhaps you are worried about your project that you think has too many of them. Or perhaps you have a complementary recommendation that we missed in the above list. Do get in touch! Perhaps we can even meet at FOSDEM 2024 and exchange ideas!
Fabio Valentini (package maintainer for the Fedora Project) added (via mastodon.social) that cargo deny implements some of the checks we recommend as well. It should really have been in the list of recommendations!
Thomas Karpiniec raises very similar points in this blog post, which also explains steps you can take if you want to fetch your dependencies from the Debian-curated collection of Rust crates instead of crates.io.