Blog

embedded

Why Rust is a great fit for embedded software

Henk Dieter
Henk Dieter

Rust is nice for a lot of things. At Tweede golf we've been using the language primarily for high-performance web applications. But that's not all Rust can do. Rust can be used to write embedded applications as well. I'd like to help you get an idea of whether Rust is a good fit for programming your next product with. Not only do I want to look at the general pros and cons, compared to the embedded lingua franca: C. I'm going to take you along writing a (very) basic embedded application in Rust as well. Rust for embedded? You be the judge! Rust for embedded. Why? And why not? Good questions, if I may say so myself. Compared to C, Rust is a relatively new language, which has its pros and cons. Like C, Rust can target low-level devices. However, Rust does a lot of things differently from C. Let's start with the things Rust has to offer. Good stuff Safety. That's one of the main selling points for Rust. It basically means that it's hard to accidentally introduce memory safety bugs when writing Rust. Rust makes unsafe operations explicit. It enforces the developer to think about what exactly they are doing. For example, reading uninitialized memory requires an explicit unsafe block. In Rust, using unsafe is like saying to the compiler: 'Even though you can't prove my code is correct, I'm sure I know what I am doing.' And then the developer can create a safe wrapper around that code, which enforces its user to interact with it in a correct manner. This way, the compiler can assure you that all non-unsafe parts are based on only well-thought-out code. Speaking of compile-time guarantees, Rust has a strong type system. Rust's type system is comparable to C's in some ways: it defines structs, enums, and comparable primitive types. One thing Rust adds on top of that is traits. These traits are a means of defining what a type can do. For instance, in embedded systems, traits can describe which CPU pins can be configured as pulse-width modulation (PWM) outputs and how they should be configured. Used in this fashion, traits enable the compiler to dodge common headaches. They are essentially the data sheet, encoded in Rust. The compiler can help you avoid using a peripheral in an unsupported manner. Type generics is another nice feature of Rust. They are a means to tell Rust that a function can take anything it likes, just as long as the type implements certain traits. For example, generics enable a developer to write a function to which any pin can be passed, as long as the pin can be configured as a PWM output. We can write a single function that can control any given PWM pin. Last but not least, Rust is backed by a great community. Many Rustaceans are very willing to help out with solving bugs and other problems. Focusing on helping you improve, most community members stay positive even when asked the most trivial of questions. Not everything is great Rust is not all kittens and rainbows, though. Rust is not as mature as C. Debugging can be quite hard in embedded contexts, and most debug tooling is focused on C or C++ development. Although it's getting better fast, debugging Rust embedded applications can be a bit dodgy sometimes. Rust being a relatively new language, there's not a lot of libraries or examples a developer can use to improve development speed. It means that sometimes, wheels need to be re-invented when developing embedded applications in Rust. And let's not forget to address the elephant in the room. Rust is feared for its most prominent feature: the borrow checker. Simply stated, the borrow checker essentially enforces two rules. The first rule is that if there exists a reference with write access to a certain part of memory, this is the only reference to that part of memory. The second rule is that if shared references to a piece of memory exist, those references cannot be used to write to that piece of memory. Rust's borrow checker is a great tool for avoiding memory safety issues. It means, however, that you can't just go and structure your application as you would in C. Many people, myself included, experienced the steep learning curve that is introduced by 'fighting the borrow checker'. Rust's safety features and generics, among other things, make the compiler do a lot of work it wouldn't have to do without them. Especially for larger projects, this can cause the compile times to go through the roof. Much effort is put into improving this, but as of this moment, it can really become an issue. Nearly every platform can be programmed using C. In Rust, this is not so much the case. Being based on LLVM, Rust only targets platforms LLVM supports. One of the best-supported platforms is ARM Cortex-M. Xtensa (ESP32) or AVR CPU's are no first-class citizens in the LLVM ecosystem, even though they are quite commonly used. Food for thought. I don't expect you to make any decisions on whether your next project should be in Rust, just based on a bunch of pros and cons. We need to go deeper! Rust embedded applications in general Embedded applications written in Rust usually have some things in common. Let's have a look at what those are. Being familiar with the options will help us when we're going to design the firmware. Embedded systems don't have an operating system as your PC does. They don't have as many resources to spare for all the fancy functionality Linux or MacOS provide. Some embedded applications are built on top of a so-called Real-Time Operating System, or RTOS. Being lightweight, they offer certain guarantees about execution timing, as well as drivers for Bluetooth and networking. But Rust can be used on bare-metal systems as well. And when we do, there's no network stack or memory allocator available. We have to tell Rust that we don't want them, as they take up too many resources. This is done by putting the #![no_std] directive on top of the main source code file. With the directive in place, Rust won't include its standard library. We can only use Rust's core functionality. Another thing Rust inserts by default is what's called a 'main-shim'. This shim loads the arguments the application was called with. There's no such thing in an embedded context. The CPU just boots and goes about doing what it is told. To disable insertion of the main-shim, we add the #![no_main] directive. Loading 'er up There are a couple of dependencies, or crates, many embedded applications in Rust use. On important one is embeddedhal. HAL is short for 'Hardware Abstraction Library'. And that's exactly what this crate is. embeddedhal provides traits that can be used to describe all kinds of peripherals and clocks. It enables you to write platform-agnostic drivers, which can run on any CPU Rust supports, and that has the right peripherals available. embedded_hal is the foundation for device-specific hardware abstraction libraries, as we'll see shortly. Applications for ARM Cortex-M-based CPUs often depend on the cortex-m and cortex-m-rt crates. cortexm provides low-level integrations with Cortex-M CPUs. Among other things, it gives us safe methods of interacting with configuration registers. The other crate, cortexmrt, helps us define the correct memory layout parameters. This is necessary in order to place the application at the correct spot in the CPUs memory. cortexm_rt also helps us define the applications' entry point with the #[entry] directive it provides. A very nice tool provided by the Rust Tools team is svd2rust. This tool can read System View Description (SVD) files, which contain information about a CPU's features. For instance, it describes where to put the bytes that need to be sent over UART. svd2rust converts these SVD files to a safe Rust API crate specific to a CPU. This type of crate is called a 'Peripheral access crate' or PAC. The PAC we are going to use is the stm32f3 crate, which provides abstractions to the peripherals of CPUs in the STM32F3xx family. PACs generated with svd2rust contain a singleton Peripherals type, which contains proxies for all peripherals the CPU has. Internally, these proxies are just pointers to certain locations in memory, called registers. They are accompanied by functions with which we can write to or read from these registers, where appropriate. The Peripherals can only be obtained safely once. This means we'll either have to do the initialization in a central spot or, better yet, pass references to the appropriate registers to modules that need access to them. The need for passing references helps rule out conflicting configurations of registers. As the references are subject to the borrow checker's rules, no two pieces of code can alter the configuration of a peripheral while the other still has access to it. Combining the output of svd2rust with the traits from embeddedhal, results in a device-specific HAL crate. These abstractions can be used to set up any platform-agnostic drivers based on embeddedhal. For the STM32F3xx family of CPUs, there's the stm32f3xx_hal HAL crate. Why am I telling you all this? Because these tools are the bedrock of the application we're going to make. Let's have a look, shall we? Let's get cookin'! Before we begin: Setting up a development environment Apart from obtaining the materials, we need to set up the development environment. This blog is not big enough to go into the details. If you want to try the code, please take a look at the Rust Discovery book. My advice is to read chapter 3 thoroughly. Apart from the tools described there, I'm using Visual Studio Code with the Rust Analyzer plugin, which provides me with code completion and other nice things. Step Blink an LED Tutorials always start with a 'Hello World' example. This tutorial is no different. Except it ends there too. The embedded equivalent of 'Hello World' is a blinking LED. Fortunately, the Discovery board comes with a couple of LEDs soldered on. Easy as it may seem, blinking an LED can be done in very complicated manners. For this step, we just want to get some code running on the CPU. We'll take the quick'n'dirty way. When we're done, we'll have some basic pin set-up code, and an endless loop which switches one of the LEDs on and off, doing some busy waiting in between. This is what that looks like: // The nostd and nomain directives you read about. #![no_std] #![no_main] // Import a directive that marks the entry point of the application. use cortexmrt::entry; // Import the embedded_hal trait implementations for the STM32F303. use stm32f3xx_hal::prelude::*; // NOP means No-Op. It's an operation that does nothing. // We use it in the busy waiting loops to notify Rust that it // should not optimize these loops out, as we actually don't want // to do anything for a while. use cortex_m::asm::nop; // A panic handler is run when the application encounters an error // it cannot recover from. The handler defines what it should do // in that case. #[panic_handler] unsafe fn panichandler(info: &core::panic::PanicInfo) -> ! { // On panic, just go do nothing for eternity, // or at least until device reset. loop {} } // This is the main function, or entrypoint of our applicaton. #[entry] fn main() -> ! { // Get a handle to the peripherals. Safe Rust allows only a single instance // of this handle. That way, accidental concurrent access is avoided. let peripherals = stm32f3xx_hal::stm32::Peripherals::take().unwrap(); // Reset and clock control register. Among other things, this register // is for enabling the General Purpose Input/Output peripherals. // We constrain full access to the RCC, allowing access per part instead. // Individual modules can configure individual parts of the RCC // independently from now on. // This gets more important in larger applications. let mut rcc = peripherals.RCC.constrain(); // The compass LEDs are all connected to the GPIO E peripheral. // Splitting the GPIO provides access to each of the individual pins, // so we can configure each of them independently. let mut gpioe = peripherals.GPIOE.split(&mut rcc.ahb); // The Northern LED is connected to pin pe9. To use it four our purpose, // we need to configure it to a push-pull output. // Import the parts of the type that describe pin pe9 when it's in // push-pull output mode. use stm32f3xx_hal::gpio::{gpioe, Output, PushPull}; let mut led: gpioe::PE9> = gpioe .pe9 .intopushpull_output(&mut gpioe.moder, &mut gpioe.otyper); // Loop forever loop { // Wait a couple of cycles for _ in 0..100_000 { nop(); } // Enable the LED led.set_high().unwrap(); // Wait some more for _ in 0..100_000 { nop(); } // Disable the LED led.set_low().unwrap(); } } Now, while Rust is pretty good at guessing which type my variables should be, I have explicitly noted the type of led: gpioe::PE9>. While that might look very scary to you, it actually illustrates what is so nice about the Rust type system in embedded context. Every pin is of its own type: gpioe::PE9. And that type is generic over the pin mode: Output. This type provides the methods sethigh and setlow. But the application won't compile if I try to read the pin's state. It should have been configured as an input for that. What's more, as a driver library writer, I can enforce my users to pass only correctly initialized peripheral references to my functions. Rust will statically check whether this is done correctly. How thoughtful of Rust to just let us know when we're doing something silly! Conclusion Using Rust for small examples like blinking an LED might seem overkill. But it does illustrate nicely how Rust's type system helps in correctly setting up device peripherals. In larger applications, this becomes a much bigger pro. Especially when multiple people are going to be writing it. And when you want to re-use the code. The Rust compiler being the strict teacher that it is, helps avoid notorious memory safety bugs. As debugging and embedded device is a lot harder than inspecting, say, a browser application, this is an essential feature in our context. And ultimately, development speed will probably be higher than it would be with C. Rust is young. It's becoming mature over time, but many things you can do with C cannot be done in Rust yet. But with devices like those from the STM32F3 family, which are pretty well supported by Rust, we think that Rust is a great language to write your embedded applications in.

Making embedded robust with Rust

Lars
Lars

Embedded software has an issue that most software doesn't: It can be very hard to get it patched. Sometimes a device hangs 5 meters high on a street light in the middle of a highway in another country. Sometimes a device is attached to a customer's heart. Sometimes strict validation requirements make changes to the software very expensive. In each case it is important to build software that doesn't fail, even in unpredictable conditions. Recently, we had the opportunity to write the software for a medical device. We had only a few months to write it before it needed to pass IEC 62304 validation. In the final phase of the project, the device was already working but we hadn't spent much time on error handling yet. In order to make our code as robust as possible and have it pass validation, we took the following approach to error handling. Categorizing errors In order to make our software robust, we've already been using the Rust programming language from the start. It has a strict type system that helps avoid many potential errors by checking them before we ever run the code. We still needed to handle hardware errors though. We began by categorizing these errors by their severity and recovery methods. IO errors Some errors are the result of simple operations that are expected to fail occasionally. These are usually operations that involve communication with a hardware device that can be affected by electrical noise. When communication with a peripheral fails, the operation can simply be retried immediately and it will usually succeed. For the case it keeps failing, we give these operations a retry limit as well as a time limit. When those are exceeded, we consider the component to have failed. Component errors Errors that can't be solved by simply retrying the operation are usually handled by resetting and reinitializing the relevant components. This can be expensive, so when you're on limited battery power, it's good to wait a few seconds before reinitializing. This both helps the remaining parts to stabilize and prevents an expensive reset loop from eating the entire battery. If the component is critical for the purpose of the device, the reset should be attempted indefinitely until the power runs out, but if the device can keep functioning without it, it should just give up at some point and leave the component disabled. Fatal errors The severest kinds of errors are the kinds that can only be resolved by power cycling or not at all. In some cases it's best to shut down and wait for the device to be retrieved for repair. In our case, we wanted to keep trying until the battery runs out, since we wouldn't need it for anything else. These kinds of errors usually occur when the software or a peripheral ends up in a bad state. For example, an SD card can switch to inactive mode if initialization fails and there is no way to get it out of that state without power cycling it. The ability to turn the power off for a peripheral cannot be taken for granted in an embedded setting. The hardware usually needs to be designed to include a software controllable power switch, or multiple, one for each peripheral. Without these, recovery from fatal errors may not be possible without human intervention. It can also be useful to turn on the watchdog peripheral, which reboots the device after a timeout. As long as the device keeps passing through its main loop, it will avoid the reboot by regularly bumping the timeout. If on the other hand it gets stuck somewhere and fails to bump the timeout for a while, it will be reset, breaking the cycle. Don't panic In Rust, most errors are reported using result types. This means the result of a function can indicate either success or some kind of error. Rust makes it easy to bubble these errors up to the points where we want to deal with them and it warns us if we ignore them. However, some sources of errors are too common to encode in the type system. Indexing an array is one example. If an index is out of range, instead of returning an error that needs to be handled all throughout the type system, Rust panics. This is a type of error that is not meant to be recoverable. On server applications, the application can usually log an error message before failing, but on embedded systems, there is usually nothing to report to, so all the device can do is reboot. Rebooting can be expensive because every component will need to be reinitialized, so it's best to avoid that. Unfortunately, Rust's type system does not encode whether a function can panic or not. The best way to find sources of panics is to search for the functions that are usually used to cause them. The most common ones are panic, unwrap, assert and expect. When we give our code a robustness pass, we search for these keywords and replace each call with one of the error handling tactics from above. This method won't help with panics caused by indexing though. Currently, the only way to protect against those is careful writing and rigorous testing. We're looking forward to the moment a feature called "const generics" becomes available in Rust, because it extends the type system so that many potential indexing errors can be proven impossible. Reporting errors Thanks to our efforts to attempt recovery from every error, we don't expect the devices to brick themselves over something minor. Nonetheless, we would still like to investigate any errors they do encounter. Our devices are offline, so they can't just send us a crash report. Instead, they log their sensor data to an SD card which is read out near the end of its life cycle. Ideally, we'd write entire stack traces to the SD card. In practice, we're on a limited power budget and SD cards use a lot of power, so we don't want to write too much. We opted to give every location in our code that can fail its own unique error tag from a single large enum. These errors only cost us 3 bytes to write. Using a single error type throughout our code base also limits the amount of error conversion code we need to write. In some cases a unique location in the source code still doesn't give us enough information. Usually, there is a high-level operation that failed and a low-level cause. In these cases, we simply log two errors in a row, making a mini stack trace of 6 bytes. We also need to watch out that we don't report 6 bytes of errors every millisecond whenever recovery fails. To solve that, we limit the number of errors that can be written in a given time period. In order to not miss out completely on errors after the limit, we keep track of flags for the peripherals and severities that those errors occured at. Concluding remarks By combining cheap yet sufficiently detailed error logging with appropriate recovery tactics, we've built a robust product that is unlikely to fail and easy to debug if it does. By relying on unwrap initially, we made it easy for ourselves to find most error sources. If you have any thoughts about this approach or would like our help developing your project, get in touch with me or Hugo. \[1\] Const generics have already been in the works for more than two years. https://github.com/rust-lang/rust/issues/44580