Blogs by Folkert
Fixing our own problems in the Rust compiler
In our data compression projects, we use Rust where C is traditionally used. During the work, we've hit limitations in Rust itself and in the surrounding tooling. Over the years, we've become increasingly comfortable with fixing these issues ourselves.
Emulating avx-512 intrinsics in Miri
Recently we've started work on using more avx512 features in zlib-rs. The avx512 family of target features provides SIMD intrinsics that use 512-bit vectors (double the size of avx2, which uses 256-bit vectors). These wider intrinsics can speed up certain algorithms dramatically.
Fixing rust-lang stdarch issues in LLVM
A couple of months ago I became a co-maintainer of rust-lang/stdarch, which defines vendor-specific APIs that are used by the Rust standard library and Rust users writing explicit SIMD code.
SIMD in zlib-rs (part 2): compare256
In part 1 of the "SIMD in zlib-rs" series, we've seen that, with a bit of nudging, autovectorization can produce optimal code for some problems.
But that does not always work: with SIMD clever programmers can still beat the compiler. This time we'll look at a problem where the compiler is not currently capable of using the SIMD capabilities of modern CPUs effectively.
What is my fuzzer doing?
Fuzz testing is incredibly useful: it has caught many a bug during the development of NTP packet parsing and gzip/bzip2 (de)compression.
But I've always been unsatisfied with the fuzzer being a black box. When it runs for hours and reports no issues, what do we actually learn from that? In ntpd-rs we've previously had a bug fly under the radar because the fuzzer just did not reach a large chunk of code. So, does my fuzzer actually exercise the code paths that I think it should?
I'm fascinated by the creative use of SIMD instructions. When you first learn about SIMD, it is clear that doing more multiplications in a single instruction is useful for speeding up matrix multiplication. But how can all of these weird instructions be used to solve problems that aren't just arithmetic?
Translating bzip2 with c2rust
Over the past couple of months we've been hard at work on libbzip2-rs, a 100% Rust drop-in compatible implementation the bzip2 compression and decompression functionality.
For this project, we used c2rust for the initial translation from the C code to a Rust implementation. The generated Rust code has now been cleaned up and made safe where possible. This post describes our experiences using c2rust for this project.
zlib-rs is faster than C
Power consumption of an experimental webserver
This article was authored by Jordy Aaldering and Folkert de Vries
Over the past couple of months, we teamed up with Bernard van Gastel and Jordy Aaldering at Radboud University's Software Energy Lab to measure nea's energy efficiency.
Rust interop in practice: speaking Python and Javascript
Current zlib-rs performance
Our zlib-rs project implements a drop-in replacement for libz.so, a dynamic library that is widely used to perform gzip (de)compression.
Tock binary size
Tock is a powerful and secure embedded operating system. While Tock was designed with resource constraints in mind, years of additional features, generalizing to more platforms, and security improvements have brought resource, and in particular, code size bloat.
Rust in Production at Tweede golf (podcast)
In February of 2024, I was invited by Matthias Endler of Corrode to join him on his podcast Rust in Production. We discussed how Tweede golf uses Rust in production, to ensure the safety and security of critical infrastructure software.
Building an Async Runtime with mio
Asynchronous programming is pretty weird. While it is straightforward enough to understand in principle (write code that looks synchronous, but may be run concurrently yada yada yada), it is not so obvious how and when async functions actually perform work. This blog aims to shed light on how that works in Rust.
ntpd-rs: Folkert explains the project (video)
At the GOSIM Conference in Shanghai, last September, I had the opportunity to talk about ntpd-rs, our project implementing the Network Time Protocol.
Statime continues: Boundary Clocks and Master Ports
About one year ago, Tweede Golf announced "Statime", a Rust implementation of the Precision Time Protocol (PTP). The result of that first phase was a working proof of concept. Quite a bit has changed since then.
Report: NTP security audit
In March/April 2023 ntpd-rs underwent a security audit. The audit was executed by Radically Open Security and funded by NLnet Foundation. The audit did not uncover any major issues, but did help us make ntpd-rs more robust. It has been extremely valuable to have someone from outside of the development team look at the code in detail.
ntpd-rs: NTP for the modern era (video)
At RustNL 2023, a Rust conference held in Amsterdam recently, I had the opportunity to talk about ntpd-rs, our project implementing the Network Time Protocol.
While working on the Roc compiler, we regularly dive deep on computer science topics. A recurring theme is speed, both the runtime performance of the code that we generate, as well as the performance of our compiler itself.
One extremely useful technique that we have been playing with is data-oriented design: the idea that the actual data you have should guide how code is structured.
Sorting with SIMD
Google recently published a blog article and paper introducing their SIMD-accelerated sorting algorithm.
SIMD stands for single instruction, multiple data. A single instruction is used to apply the same operation to multiple pieces of data. The prototypical example is addition, where one instruction can do e.g. 4 32-bit additions. A single SIMD addition should be roughly 4 times faster than performing 4 individual additions.
This kind of instruction-level parallelism has many applications in areas with a lot of number crunching, e.g. machine learning, physics simulations, and game engines. But how can this be used for sorting? Sorting does not involve arithmetic, and the whole idea of sorting is that each element moves to its unique correct place in the output. In other words, we don't want to perform the same work for each element, so at first sight it's hard to see where SIMD can help.
To understand the basic concepts, I played around with the ideas from the paper Fast Quicksort Implementation Using AVX Instructions by Shay Gueron and Vlad Krasnov. They provide an implementation in (surprisingly readable) assembly on their github. Let's see how we can make SIMD sort.
Implementing the Network Time Protocol (NTP) in Rust
For the last couple of months we at Tweede golf have been working on implementing a Network Time Protocol (NTP) client and server in Rust.
The project is a Prossimo initiative and is supported by their sponsors, Cisco and AWS. Our first short-term goal is to deploy our implementation at Let's Encrypt. The long-term goal is to develop an alternative fully-featured NTP implementation that can be widely used.
Using C libraries in your Rust project
Recently, we gave a workshop for the folks at iHub about using Rust, specifically looking at integrating Rust with cryptography libraries written in C.
Optimizing Image Processing on the Edge
Over the past months, we have worked with Scailable to optimize their neural network evaluation. Scailable runs neural networks on edge devices, taking a neural network specification and turning it into executable machine code.
Async on Embedded: Present & Future
In our last post, we've seen that async can help reduce power consumption in embedded programs. The async machinery is much more fine-grained at switching to a different task than we reasonably could be. Embassy schedules the work intelligently, which means the work is completed faster and we race to sleep. Our application actually gets more readable because we programmers mostly don't need to worry about breaking up our functions into tasks and switching between them. Any await is a possible switching point.
Now, we want to actually start using async in our programs. Sadly there are currently some limitations. In this post, we'll look at the current workarounds, the tradeoffs, and how the limitations might be partially resolved in the near future.
Measuring power consumption: sync vs. async
Previously we talked about conserving energy using async. This time we'll take a look at performing power consumption measurements. Our goal is first to get a feel for how much power is consumed, and then to measure the difference between a standard synchronous and an async implementation of the same application.
To more effectively write Embedded Rust applications, we want a clearer picture of two aspects: how can we ergonomically perform multiple tasks concurrently, and how can we exploit low-power modes to save energy. In the coming weeks, we want to write a small but non-trivial application that communicates with 2 sensors, uses async, and uses the low-power modes to conserve energy.
In embedded systems, energy efficiency is crucial for practical applications. Usually devices run on a battery, so the less energy you use, the longer the power supply will last. In this post we'll look at the basics of going to sleep and waking back up, and build a proof of concept using the nRF52840 development kit.