While working on the Roc compiler, we regularly dive deep on computer science topics. A recurring theme is speed, both the runtime performance of the code that we generate, as well as the performance of our compiler itself.
One extremely useful technique that we have been playing with is data-oriented design: the idea that the actual data you have should guide how code is structured.
Google recently published a blog article and paper introducing their SIMD-accelerated sorting algorithm.
SIMD stands for single instruction, multiple data. A single instruction is used to apply the same operation to multiple pieces of data. The prototypical example is addition, where one instruction can do e.g. 4 32-bit additions. A single SIMD addition should be roughly 4 times faster than performing 4 individual additions.
This kind of instruction-level parallelism has many applications in areas with a lot of number crunching, e.g. machine learning, physics simulations, and game engines. But how can this be used for sorting? Sorting does not involve arithmetic, and the whole idea of sorting is that each element moves to its unique correct place in the output. In other words, we don't want to perform the same work for each element, so at first sight it's hard to see where SIMD can help.
To understand the basic concepts, I played around with the ideas from the paper Fast Quicksort Implementation Using AVX Instructions by Shay Gueron and Vlad Krasnov. They provide an implementation in (surprisingly readable) assembly on their github. Let's see how we can make SIMD sort.
The Dutch government offers the AHN [](https://www.ahn.nl/) as a way to get information about the height of any specific place in the country. They offer this data by using a point cloud. That is, a large set of points with some additional meta information. With the current version of the AHN the resolution of the dataset is about eight points per square meter. This results in about 2.5TB of compressed data for the relatively small area of the Netherlands. While this is something that is not impossible to store locally, it does offer some challenges.
At tweede golf, we value innovation: we take the time to research new technologies and subsequently challenge ourselves to try out these new techniques in order to discover new applications. We also like to learn by doing: build something first, ask questions later.