Blogs by Folkert
While working on the Roc compiler, we regularly dive deep on computer science topics. A recurring theme is speed, both the runtime performance of the code that we generate, as well as the performance of our compiler itself.
One extremely useful technique that we have been playing with is data-oriented design: the idea that the actual data you have should guide how code is structured.
Google recently published a blog article and paper introducing their SIMD-accelerated sorting algorithm.
SIMD stands for single instruction, multiple data. A single instruction is used to apply the same operation to multiple pieces of data. The prototypical example is addition, where one instruction can do e.g. 4 32-bit additions. A single SIMD addition should be roughly 4 times faster than performing 4 individual additions.
This kind of instruction-level parallelism has many applications in areas with a lot of number crunching, e.g. machine learning, physics simulations, and game engines. But how can this be used for sorting? Sorting does not involve arithmetic, and the whole idea of sorting is that each element moves to its unique correct place in the output. In other words, we don't want to perform the same work for each element, so at first sight it's hard to see where SIMD can help.
To understand the basic concepts, I played around with the ideas from the paper Fast Quicksort Implementation Using AVX Instructions by Shay Gueron and Vlad Krasnov. They provide an implementation in (surprisingly readable) assembly on their github. Let's see how we can make SIMD sort.
For the last couple of months we at Tweede golf have been working on implementing a Network Time Protocol (NTP) client and server in Rust.
The project is a Prossimo initiative and is supported by their sponsors, Cisco and AWS. Our first short-term goal is to deploy our implementation at Let's Encrypt. The long-term goal is to develop an alternative fully-featured NTP implementation that can be widely used.
In our last post, we've seen that
async can help reduce power consumption in embedded programs. The async machinery is much more fine-grained at switching to a different task than we reasonably could be. Embassy schedules the work intelligently, which means the work is completed faster and we race to sleep. Our application actually gets more readable because we programmers mostly don't need to worry about breaking up our functions into tasks and switching between them. Any
await is a possible switching point.
Now, we want to actually start using async in our programs. Sadly there are currently some limitations. In this post, we'll look at the current workarounds, the tradeoffs, and how the limitations might be partially resolved in the near future.