Does using Rust really make your software safer?

A real-world critical vulnerability
In 2021, a vulnerability was discovered in the Nucleus real-time operating system, sold by Siemens. According to security researchers:
(...) more than 3 billion devices use this real-time operating system, such as ultrasound machines, storage systems, critical systems for avionics and others.
In other words, this code had many applications, including ones where Bad Thingstm must not happen. So what went wrong?
Networked devices running on Nucleus need to deal with domain names such as tweedegolf.nl
by interacting with a DNS server. The vulnerable part in Nucleus that read the responses sent by a DNS server worked fine in the happy path scenario: real responses containing correct information were processed correctly.
But, it is possible to craft phoney "DNS responses" containing intentional "mistakes". Malicious hackers could use this to trick Nucleus into writing to memory locations that it really shouldn't.
With massive consequences: simply by overwriting a few sensitive bits, the attacker could get a device to crash. And remember that any program code itself is also stored somewhere in memory, so a more clever attacker could even try to reprogram a device running Nucleus to do whatever they want.
Now don't worry! Nucleus has been patched, so we can all sleep safe and sound again.
Why you should care
Except, lightning didn't just strike Nucleus. Four other networking libraries were also found to contain several similar problems. Collectively, these vulnerabilities were christened NAME:WRECK. This shows that in general there is a problem with how this kind of code was written in the past.
We were informed about this case by our friends from security consultancy firm Midnight Blue. The question they posed us: can you show how Rust would prevent this?
This blog will answer that question. In the first half we will provide a high-level answer that does not go into too much technical detail. The second half is aimed at C, C++ and/or Rust programmers and will take a deeper dive into the actual code of Nucleus and how you could write it in modern Rust.
Our claim is indeed that using Rust would have prevented this. This is not just as simple as saying "Rust is memory safe" (although it is). We'll go beyond that! We have performed a little engineering experiment which convinced us that, had modern Rust been used:
- Programmers would not have introduced these vulnerabilities.
- Any exploit attempt would have resulted in a recoverable error.
- The code would have been more thoroughly tested.
- Time and money would have been saved.
Root causes
Why do such mistakes happen? As programmers we are often tempted to focus on details, but conceptually the answer is simple:
- The tools used to write software don't help prevent mistakes, and in fact make it hard to detect them if they have been made.
- External input is implicitly trusted instead of being explicitly validated.
Now it's easy to point fingers and say "Ha! Those silly C programmers and their buffer overflows!". But let's not judge too harshly: some of this code was written in greener days before security was in the collective consciousness. Why on earth would a DNS server send botched messages? And when Nucleus was first built in 1993, well, was there even a realistic alternative to C for writing a real-time operating system?
How does Rust help, in practice?
Rust is a memory-safe language. This means that under most circumstances, it is guaranteed that a program written in Rust will not read from or modify memory that it is not supposed to access.
But for decoding RFC1035-encoded domain names, we hypothesized that, besides automatically giving memory safety, Rust would have two additional advantages:
- It is a more expressive algorithmic language, which means that an idiomatic Rust solution will contain fewer issues requiring special care than an idiomatic C solution.
- It's very easy to write unit tests and fuzz tests, which will encourage programmers to more critically examine their code.
The experiment
We decided to test our hypothesis using ourselves as guinea pigs. First, we wrote a description of RFC1035-style DNS message encoding and presented this to several colleagues as a programming exercise, with the instructions to spend three to four hours on it. We gave this exercise to two interns and two staff members.
In the meantime, we analysed the DNS_Unpack_Domain_Name
routine (as listed in the [Forescout report]) to build a suite of stress tests that exercise every problem it has. We also wrote a fuzzer that identified some additional common problems in DNS implementations. We kept all of this secret from the participants.
The problem statement was deliberately underspecified: it contained a link to RFC1035, but not a clear instruction to study it. The objective was to simulate a "let's code this on a Friday afternoon" situation with incomplete information, and some time pressure. Conditions in which bugs thrive.
(Just for fun, we also fed the exercise to ChatGPT to see what would happen. But that is a story for another time!)
Results
Our stress test contained 6 happy path tests (that Nucleus NET passes), and 12 negative test cases that would cause a crash, erroneous result, or trigger an exploitable condition in the original Nucleus NET. Without going into any details, the table below tallies the results for all tests and compares them to the code from Nucleus NET.
A green mark means a passed test: given some input, the program produced a valid result. For the happy path tests this means that the input was decoded successfully; for the stress tests it means the input was rejected.
Orange means a "normal" test failure: the code either rejected a valid input, or decoded something that it should have rejected. Simple bugs, but not enough to lead to an exploit.
Red means the code failed in a more ominous way: for example hitting a run time abort (such as a panic!
in Rust), getting stuck in an infinite loop, or writing to memory locations that should not be written to. In short, red means "exploitable".
implementation | happy path | stress tests |
---|---|---|
Nucleus NET | 🟩🟩🟩🟩🟩🟩 | 🟧🟧🟥🟧🟥🟧🟥🟥🟧🟥🟥🟥 |
Engineer 1 | 🟩🟩🟧🟩🟩🟩 | 🟧🟧🟩🟧🟩🟧🟩🟩🟩🟩🟧🟩 |
Engineer 2 | 🟩🟩🟩🟩🟩🟩 | 🟩🟩🟩🟩🟩🟧🟩🟩🟩🟩🟩🟩 |
Engineer 3 | 🟩🟩🟩🟩🟩🟩 | 🟩🟩🟩🟩🟩🟧🟩🟩🟩🟩🟩🟩 |
Engineer 4 | 🟩🟩🟩🟩🟩🟩 | 🟩🟩🟩🟩🟩🟧🟩🟧🟩🟩🟧🟩 |
Some observations:
- All engineers used fuzzing to test for panic safety, and as a result, no Rust implementation has a red mark.
- The seventh stress test made Nucleus NET enter an infinite loop, enough by itself to cause a denial of service. Without being told, everyone caught this. Three engineers found it through fuzz-testing.
- Most observed remaining 'simple bugs' were subtle violations of the RFC1035 specification, such as ignoring a length limit.
- The sixth stress test was rather pedantic: it checks that a DNS decoder rejects an otherwise reasonable decoding based on a strict interpretation of the word "prior" in RFC1035.
- In some test cases RFC1035 was not clear on what to do precisely. In those cases there were two reasonable outcomes that would earn a green mark.
Evaluation
Let's revisit the four claims we made at the beginning:
- Rust is less likely to have vulnerabilities: Indeed no engineer introduced an arbitrary code execution vulnerability: nobody felt the need to use
unsafe
Rust. - Any exploit attempt would have resulted in a recoverable error: All the solutions were panic safe, i.e. they would never cause an abnormal termination of the software.
- Rust code is more thoroughly tested: All engineers wrote unit tests and fuzzed their code within the allotted time. Several engineers discovered a critical error this way.
- Using Rust saves time and money All these solutions were developed quickly. We've also taken a stab at a proper C implementation (by an experienced C programmer), and even armed with the knowledge we gained through this experiment, it still took at least three times the amount of time to get a secure version. This is not to mention, of course, the amount of money wasted patching a vulnerability 20 years down the road, or the potential economic and/or social costs that might have resulted if these vulnerabilities had been actively exploited.
These findings will not be surprising to anybody who has written Rust, or who has researched software security. But we hope that these results will help you think about Rust as more than just "that programming language that has more restrictions".
At Tweede golf, we don't just use Rust because it prevents us from making mistakes - we're also using it because it enables us to write safer software, and being quick at the same time.
The Deeper Dive
We can hear the programmers cry: show me some code! To learn about the Nucleus NET routine, we can of course recommend reading the [Forescout report], but we will also briefly illustrate the problem here.
Simplified somewhat, RFC1035 says that a domain name in a DNS message is encoded as a sequence of labels, each of which is preceded by a length byte. These labels concatenated together (and interspersed with a .
) form the human-readable version of a domain name. A zero byte denotes the end of the domain name.
I.e. the domain name google.com
can be represented as:
\x06 | g | o | o | g | l | e | \x03 | c | o | m | \x00 |
A quick-and-dirty C function that decodes this format could be written as follows:
uint8_t *unpack_dns(uint8_t *src) {
char *buf, *dst;
int len;
buf = dst = malloc(strlen(src) + 1);
while((len = *src++) != 0) {
while(len--)
*dst++ = *src++;
*dst++ = '.';
}
dst[-1] = 0;
return buf;
}
(The above routine is actually inspired by the DNS decoding routine from Nut/OS, also used in embedded devices, which was the part of yet another collection of vulnerabilities in TCP/IP stacks - so this is realistic code!)
Take some time to spot the errors that can cause this routine to write to invalid memory locations. When you're ready...
Click to reveal the errors:
1. An attacker can embed null bytes in parts of the "domain" name, making strlen
report the wrong size and get malloc
to allocate less memory than is actually needed.
2. In the while
loop, there is no check that len
does not exceed the capacity of buf
.
3. The dst[-1] = 0
near the end is problematic too: if src
points to a null byte, this line will write to a memory location that lies before the memory allocated by malloc()
.
We leave it as an exercise to the reader to translate this to a Rust function, and observe that it's relatively straight-forward to get a big boost in safety:
fn unpack_dns(mut src: &[u8]) -> Option<Vec<u8>> { todo!() }
The Nucleus NET code was slightly more complex than the above, since it also tried to deal with a compression scheme described in RFC1035: if a length byte has its two upper bits set (i.e., it is 0xC0
or higher), that means that it and the byte following it combine to a 14-bit offset into the DNS message that contains the remainder of a domain name. This allows backward references. For example, assuming that at offset 14Ah in a DNS response we find:
# | 14A | 14B | 14C | 14D | 14E | 14F | 150 | 151 | 152 | 153 | 154 |
---|---|---|---|---|---|---|---|---|---|---|---|
.. | \x01 | a | \x03 | n | e | t | \x00 | \x01 | b | \xC1 | \x4D |
Then offset 14Ah encodes a.net
, and offset 152h encodes b.net
.
It should be obvious that just blindly accepting any offset that the input contains can easily allow a decoder to access memory that should be out-of-bounds.
We would love to dive into what can go wrong in DNS implementations, but we would be wasting our time: RFC9267, published in 2022, does just that! It is highly readable and contains various examples of mistakes that have occurred in the wild.
We also have some criticisms on RFC1035 itself. For example, the encoding has room for encodings that are clearly useless. As an example, we would prefer if it spelled out clearly that a 14-bit offset is not allowed to jump directly to another offset (i.e "double jumping") or to a null-byte. In some stress tests, we used these useless encodings (since they made Nucleus NET crash in interesting ways), but would have accepted both a correctly decoded domain name, as well as an error condition as a correct answer. It's not even quite clear to us whether a completely empty domain name is a valid encoding or not.
Vulnerable C code
Here, in all its detail, is the Nucleus NET code (as it was prior to v5.2, which has patched it) as printed in the Forescout report. We have edited the types for clarity and added some comments:
int DNS_Unpack_Domain_Name(uint8_t *dst, uint8_t *src, uint8_t *buf_begin) {
int16_t size;
int i, retval = 0;
uint8_t *savesrc;
savesrc = src;
while(*src) {
size = *src;
while((size & 0xC0) == 0xC0) {
if(!retval) {
retval = src - savesrc + 2;
}
src++;
src = &buf_begin[(size & 0x3F) * 256 + *src]; /* ! */
size = *src;
}
src++;
for(i=0; i < (size & 0x3F); i++) { /* ! */
*dst++ = *src++;
}
*dst++ = '.';
}
*(--dst) = 0; /* ! */
src++;
if(!retval) {
retval = src - savesrc;
}
return retval;
}
Let's list some problems:
The expression &buf_begin[(size & 0x3F) * 256 + *src];
has multiple problems:
- First, it simply accepts whatever offset the input provides and proceeds from that memory location. Whee! 🎢
- Second, this very same line can make us go to a memory offset already visited, causing the aforementioned infinite loop.
- Third, in case this line makes
src
point to a memory location containing a null byte, it will be skipped over, an empty domain name part will be written for good measure, and the code will bravely soldier on decoding whatever comes next.
There also two things wrong with the for
-loop:
- First, like in our example above, there's no bound check at all that the result will fit in the memory pointed to by
dst
, or that it will adhere to the maximum length for domain names specified by RFC1035 (which is 255 bytes). - Second, the expression
size & 0x3F
in the condition of thefor
loop only masks the top two bits of the length byte, but doesn't actually check it for validity. So an invalid length indicator like 65 will be read like 1, and we're at the mercy of the input beyond this point.
If *src
points to null byte, this has the same bug as our quick-and-dirty implementation above.
In that case *(--dst) = 0
near the end of the function is likely to write to memory that is reserved for the memory allocation system.
What this routine can look like in Rust
As an amalgam of all solutions by our engineers, we have come up with the following 'exemplary' solution to the problem:
pub fn decode_dns_name<'a>(mut input: &'a [u8], mut backlog: &'a [u8]) -> Option<Vec<u8>> {
let mut result = Vec::with_capacity(256);
loop {
match usize::from(*input.first()?) {
0 => break,
prefix @ ..=0x3F if result.len() + prefix <= 255 => {
let part;
(part, input) = input[1..].split_at_checked(prefix)?;
result.extend_from_slice(part);
result.push(b'.');
}
0xC0.. => {
let (offset_bytes, _) = input.split_first_chunk()?;
let offset = u16::from_be_bytes(*offset_bytes) & !0xC000;
(backlog, input) = backlog.split_at_checked(usize::from(offset))?;
}
_ => return None,
}
}
result.pop()?;
Some(result)
}
If there are embedded programmers out there who are now foaming at the mouth, laughing at us for allocating a vector; It's quite easy to replace Vec
here with heapless::Vec<u8, 256>
. Try it! In fact this solution becomes simpler as it will no longer need the if
guard in the second arm of the match
expression!
Of course we are biased, but we also think this routine much more clearly expresses what is being done.
(For a Rust implementation actually used in practice, you can also have a look at the smoltcp
source code. The function signature is slightly different, and there are some stylistic differences, but the logical structure is pretty similar.)
Conclusion
The claims that C-code is memory unsafe, that there is actually harmful memory-unsafe code out there, and that Rust could fix this, are nothing new. There is even proof out there, straight from big tech. But we were challenged to do our own experiment, and despite our engineers receiving very limited time and instructions, the resulting Rust code did indeed avoid the memory safety related vulnerabilities. If you want, you can even try it yourself.
We keep saying that Rust is how we make software safer. Hopefully either the overview or the technical deep-dive gave you some insight into why we keep saying this, and how that works.
Make your software safer with Rust
Get help from the experts!
- Improve security
- Lower life-time security costs
- Reduce time-to-market
Work with us!