Report: NTP security audit

Folkert
Systems software engineer
Report: NTP security audit
In March/April 2023 ntpd-rs underwent a security audit. The audit was executed by Radically Open Security and funded by NLnet Foundation. The audit did not uncover any major issues, but did help us make ntpd-rs more robust. It has been extremely valuable to have someone from outside of the development team look at the code in detail.

Findings by threat level

In May we had our first CVE on the project, a further learning opportunity for how to write secure software, that luckily did not impact many users.

Why audit ntpd-rs

The Network Time Protocol (NTP) synchronizes time across a network. Accurate timekeeping is essential for various security protocols like TLS or DNSSEC. These protocols rely on being able to expire keys after a certain amount of time. By manipulating the time of a system, the system can be tricked into believing that an expired key is still valid.

This is why security has been a top priority during our development. A lot of care has been put into ensuring that the clock remains accurate in the presence of a compromised source of time. For instance we mandate a minimum number of time sources, and that a subset of the sources roughly agrees on the true time.

Memory Safety

A significant proportion of security vulnerabilities arises from memory safety issues. Software can inadvertently expose sensitive data in memory or trigger a segmentation fault, taking the whole process down.

But in Rust, the compiler guarantees memory safe code, at least for code that does not use unsafe blocks and functions. We do use some unsafe code to configure UDP sockets (so they will provide more accurate timestamps) and to actually steer the clock. However, the unsafe surface area is small, isolated, and wrapped in safe interfaces.

Key Material

The NTS (Network Time Security) extension of NTP uses TLS to establish a trusted link between an NTP server and client. This means some sensitive security keys are kept in memory, and can potentially be extracted by an attacker. While this attack would be difficult to pull off, the audit recommended that we make it even harder.

The zeroize crate ensures that the memory storing the key is set to zero when the key is dropped:

pub struct AesSivCmac256(Key<Aes128Siv>);

impl ZeroizeOnDrop for AesSivCmac256 {}

impl Drop for AesSivCmac256 {
    fn drop(&mut self) {
        self.0.zeroize()
    }
}

However, this does not entirely guarantee that the key is no longer somewhere in memory because Rust is allowed to move memory. The key bytes would remain at the original location. We have made additional changes to minimize how often keys are moved to make the chance of key material staying behind as small as possible.

Fuzzing

An NTP server, which provides its time to NTP clients, exposes a UDP port to the network, and processes messages arriving on that port. That is a scary thing to do, because anything could arrive on that port.

Because the port is public we have to make sure our parser can handle any input. It is not allowed to make assumptions about the format. Otherwise, an attacker coulds send a carefully crafted sequence of bytes that triggers some panic in the parsing logic, which would take the whole NTP server down.

Fuzzing is an extremely effective approach for making robust parsers. The random input they generate quickly finds issues during development. The audit recommended that we add additional fuzzing, and our auditer even contributed some fuzz tests.

Our first CVE

But of course, it only finds issues in code you test. It turned out that our fuzz tests missed a crucial path related to NTS, and this path had a classic bug: the data has a length field, but the actual message could be shorter than that length. The code assumes however that the length field is correct, causing an out-of-bounds access. Because Rust has bounds checking, out-of-bounds access triggers a panic, but that meant invalid input could bring the server down.

The bug was discovered after the audit, and has now been fixed. It was actually quite interesting to run through the CVE process, and it is a good thing to gain this experience while we have a limited number of users.

Github makes this process extremely easy. It provides private forks to work on a fix, and guides you through requesting the CVE.

Denial of Service

The vast majority of findings in the audit relates to denial of service: being able to take down an NTP server, making it unable to respond to further requests.

Findings by type

In the simple case, an attacker can trigger a panic in the system. Sometimes we had just been sloppy and used a .unwrap on an option or result, in other cases a combination of configuration parameters could hit code that we thought was unreachable.

It is hard to root out all of these panics. This process is complicated by the fact that sometimes we do actually want to panic. When a client or server reaches an invalid state, it will abort. We prefer this behavior over trying to keep running, and potentially steering the clock in a (very) wrong direction.

A client can attempt to degrade service for other clients by sending many messages in close succession. To combat such an attack, we have implemented a rate limiting system. Our implementation works reasonably well given the tradeoffs, but is not as strong as we'd like for ipv6 traffic.

Conclusion

Most of the findings have now been addressed. The remaining problems are low severity, and figuring out the best way to solve them will require a bit of time.

But a fresh perspective on our code also found many small rough edges that didn't even have that much to do with security per se. For instance we accepted any configuration fields, instead of reporting an error when an unrecognized field was present, and some logging messages weren't clear enough to outsiders.

Overall, this has been a very valuable experience, our code has improved significantly, and we've learned at lot about writing secure and robust sofware in rust.

The full audit report can be found here.

Stay up-to-date

Stay up-to-date with our work and blog posts?

Related articles

The internet has a hole at the bottom of its trust stack, and we need to do something about it. In particular, the internet needs secure time synchronization to fortify the security of our digital world. In this article, we present a path towards the adoption of securely synchronized time.
PTP was originally designed for networks in which all devices were ultimately trusted. In version 1, no security mechanism was present, and version 2 only provided an experimental mechanism. However, with version 2.1 of the PTP standard (IEEE 1588-2019) there is now a normative security mechanism in section 16.14.
Messing around with people's clocks can be a great source of practical jokes. Even nowadays, with many people getting their time digitally, this is not as impossible as you might think. (And the month of April, with the switch to summer time and April Fool's Day, provided the perfect timing for this experiment, of course...)