Blogs by Folkert

Tech blog on web, security & embedded
In February of 2024, I was invited by Matthias Endler of Corrode to join him on his podcast Rust in Production. We discussed how Tweede golf uses Rust in production, to ensure the safety and security of critical infrastructure software.
Asynchronous programming is pretty weird. While it is straightforward enough to understand in principle (write code that looks synchronous, but may be run concurrently yada yada yada), it is not so obvious how and when async functions actually perform work. This blog aims to shed light on how that works in Rust.
At the GOSIM Conference in Shanghai, last September, I had the opportunity to talk about ntpd-rs, our project implementing the Network Time Protocol.
About one year ago, Tweede Golf announced "Statime", a Rust implementation of the Precision Time Protocol (PTP). The result of that first phase was a working proof of concept. Quite a bit has changed since then.
The latest release of ntpd-rs compiles on several new targets: the FreeBSD and macOS operating systems now work, and ntpd-rs now supports musl libc on Linux. The PRs adding support for these platforms are all community contributions, which is very exciting.
In March/April 2023 ntpd-rs underwent a security audit. The audit was executed by Radically Open Security and funded by NLnet Foundation. The audit did not uncover any major issues, but did help us make ntpd-rs more robust. It has been extremely valuable to have someone from outside of the development team look at the code in detail.
At RustNL 2023, a Rust conference held in Amsterdam recently, I had the opportunity to talk about ntpd-rs, our project implementing the Network Time Protocol.

While working on the Roc compiler, we regularly dive deep on computer science topics. A recurring theme is speed, both the runtime performance of the code that we generate, as well as the performance of our compiler itself.

One extremely useful technique that we have been playing with is data-oriented design: the idea that the actual data you have should guide how code is structured.

December 9, 2022

Sorting with SIMD

Google recently published a blog article and paper introducing their SIMD-accelerated sorting algorithm.

SIMD stands for single instruction, multiple data. A single instruction is used to apply the same operation to multiple pieces of data. The prototypical example is addition, where one instruction can do e.g. 4 32-bit additions. A single SIMD addition should be roughly 4 times faster than performing 4 individual additions.

This kind of instruction-level parallelism has many applications in areas with a lot of number crunching, e.g. machine learning, physics simulations, and game engines. But how can this be used for sorting? Sorting does not involve arithmetic, and the whole idea of sorting is that each element moves to its unique correct place in the output. In other words, we don't want to perform the same work for each element, so at first sight it's hard to see where SIMD can help.

To understand the basic concepts, I played around with the ideas from the paper Fast Quicksort Implementation Using AVX Instructions by Shay Gueron and Vlad Krasnov. They provide an implementation in (surprisingly readable) assembly on their github. Let's see how we can make SIMD sort.

For the last couple of months we at Tweede golf have been working on implementing a Network Time Protocol (NTP) client and server in Rust.

The project is a Prossimo initiative and is supported by their sponsors, Cisco and AWS. Our first short-term goal is to deploy our implementation at Let's Encrypt. The long-term goal is to develop an alternative fully-featured NTP implementation that can be widely used.

Recently, we gave a workshop for the folks at iHub about using Rust, specifically looking at integrating Rust with cryptography libraries written in C.
Over the past months, we have worked with Scailable to optimize their neural network evaluation. Scailable runs neural networks on edge devices, taking a neural network specification and turning it into executable machine code.

In our last post, we've seen that async can help reduce power consumption in embedded programs. The async machinery is much more fine-grained at switching to a different task than we reasonably could be. Embassy schedules the work intelligently, which means the work is completed faster and we race to sleep. Our application actually gets more readable because we programmers mostly don't need to worry about breaking up our functions into tasks and switching between them. Any await is a possible switching point.

Now, we want to actually start using async in our programs. Sadly there are currently some limitations. In this post, we'll look at the current workarounds, the tradeoffs, and how the limitations might be partially resolved in the near future.

Previously we talked about conserving energy using async. This time we'll take a look at performing power consumption measurements. Our goal is first to get a feel for how much power is consumed, and then to measure the difference between a standard synchronous and an async implementation of the same application.
To more effectively write Embedded Rust applications, we want a clearer picture of two aspects: how can we ergonomically perform multiple tasks concurrently, and how can we exploit low-power modes to save energy. In the coming weeks, we want to write a small but non-trivial application that communicates with 2 sensors, uses async, and uses the low-power modes to conserve energy.
In embedded systems, energy efficiency is crucial for practical applications. Usually devices run on a battery, so the less energy you use, the longer the power supply will last. In this post we'll look at the basics of going to sleep and waking back up, and build a proof of concept using the nRF52840 development kit.