June 6, 2024

Mix in Rust

Henk

Embedded software engineer

What does it actually mean to introduce Rust in an existing project, and having it communicate with other languages in the code base? This article launches a series of blog posts that provide guidance for introducing Rust into your code base step by step.

This article is part of our Rust Interop Guide.

Imagine yourself in the following situation. You came across Rust a while ago and you like what you see. You've done some learning and hobby projects in Rust, got pretty good at it, and are wondering if you could somehow use Rust in the huge software project you're working on during your day job. After all, Rust's promises are hard to ignore: apart from Rust being blazingly fast and memory-efficient, its type system and ownership model guarantee memory-safety and thread-safety. Being a sane person, you understand that 'Rewrite it in Rust' is no realistic answer. You'd like to oxidize the code base: gradually introduce Rust in the application where possible. When you're setting up a new application, you'd like to use foreign dependencies from your Rust executable.

You're facing quite the challenge here. You'll quickly find that adding Rust to your codebase is not as trivial as you'd hoped it would be. But we're here to help you tackle it. With this series of blog posts, we want to provide some guidance with which you can set up a plan to introduce Rust in your projects. Because in order to do that, you'll need to know what it actually means to do so. First off, we'll have a look at how Rust does interop in general, and in following posts, we will go over some fantastic tools that make the finnicky world of interop much prettier.

How does Rust talk with non-Rust?

It's obvious that programming languages differ greatly in syntax. If you try to compile C code using the Rust compiler, it's going to get confused, to say the least. But even if this were somehow not a problem, programming languages differ in semantics as well. That is the hard part. To declare a variable in Rust means something else than to declare a variable in C. To call a function in Rust means something different than to call a function in Java. And to define a type in Rust means something different than to define a type in Python. Languages make different trade-offs. The devil's in the details.

It may sounds surprising, but Rust does not guarantee how its types are laid out in memory. Even between two compiler invocations, the memory layout of your struct may differ. Apart from primitives, arrays, and string slices, you can't assume that the bytes in memory are ordered in a specific way. By default, Rust retains the right to reorder struct and enum fields in order to make your application more efficient. In the same vein, Rust hasn't settled on a stable calling convention: it doesn't promise anything about the way individual function calls work. This allows Rust to optimize calls, which makes for very performant code.

Well, then how do any two programming languages communicate? The trick is the same as the one I'm pulling by writing this blog post in English, while I'm a Dutch native speaker. And anyone who can read English is able to follow me, even non-native speakers. You use some kind of commonly known language.

In programming, the lingua franca is C. C has a standardized Application Binary Interface or ABI, so its type layout and calling conventions are well-known. Every self-respecting programming language has some means of adhering to the C ABI. Of course, Rust is no exception. But the C ABI was made to support C: a relatively simple programming language that has been around for ages. As such, when speaking C ABI, we can't express everything there is to say about Rust types. Just like how there's no good English translation of the Dutch word 'gezellig'. There are no generics, no fat pointers, no trait objects, no notion of ownership or reference lifetimes, no async/await, no impl Drop. We need to communicate using primitives, pointers, and unions. And through documentation. Lots of it. But, spoiler: it's totally possible.

What does that look like?

To have Rust represent data and call functions as prescribed in the C ABI, we have a number of tools. First off, we have extern "C" fn to define Rust functions that can be called using the C calling convention, and can therefore be used from C:

#[no_mangle]
extern "C" fn say_hello() {
    println!("Hello there!");
}

The #[no_mangle] attribute here ensures the compiler doesn't mangle the function name, so that the function name is stable and discoverable for linking. Then there's extern "C" {} to declare functions that have been declared externally and can be called from Rust using the C calling convention:

extern "C" {
    fn add(lhs: u32, rhs: u32) -> u32;
}

But when declaring functions this way, we need to ensure that the data we pass as arguments as well as the returned data can be represented with the C ABI. Luckily, there are ways. Rust's primitives map pretty well to C's. An f32 is a float in C, an i64 is a C long long, a u8 is a C unsigned char. So the basics are covered by the Rust creators defining these data types like C does. If your struct consists only of fields with primitive fields, you can slap on #[repr(C)] to have Rust use the C representation for this struct like so:

#[repr(C)]
struct Point {
    x: f32,
    y: f32
}

Pretty straightforward. We can do something similar with enums that don't hold any data in their variants:

#[repr(u8)]
enum Color {
    R,
    G,
    B
}

By annotating it with #[repr(u8)], we force Rust to represent each of the enum variants as a single byte. Furthermore, if your struct is just a simple wrapper around some other type, you can annotate it with the #[repr(transparent)] attribute to force Rust to layout the wrapper exactly the same way as the type being wrapped. For example:

#[repr(transparent)]
struct Wrapper<T>(T);

Now if the T has a defined layout, then so does Wrapper<T>. Neat.

However, as mentioned, we need to be more creative if we want to use more elaborate types, like Result<T, E>, String, or fat pointers like &[T]. If your type has special rules, then you can't simply annotate your structs and call it a day. For instance, to pass a &[u32], we have to keep in mind the way they are represented in Rust: two words, one for a pointer to the start of the slice, and one for the length:

struct ImaginaryU32Slice {
    start: *const u32,
    len: usize
}

Typically in C, you'd pass both fields as separate arguments. So your extern "C" fn may look like this:

#[no_mangle]
extern "C" fn takes_slice(data: *const u32, len: u32) {
    /* do stuff */
}

Which seems reasonable, but it introduces a nasty problem: we have to define the representation by hand. That's doable for slices, but how about Strings, being guaranteed to be UTF-8 encoded in Rust, and being guaranteed to end with nul in C? Which kind do we pick? What does it even mean to expose generic functions? Furthermore, we're unable to express ownership, so rules about who frees which memory allocation, how data is represented, and more must be documented very clearly and unambiguously, and that documentation needs to be read thouroughly by the user, and then interpreted as intended and then implemented and maintained correctly. Whew!

If you really want to use Rust to its full extent when mixing it with another language, then either you'll have to manually write wrappers, or, if you're lucky, use some kind of smart tool to generate glue code. Either way, you'll need knowledge of the domain of your application, as well as both Rust and the language you want it to talk to. That is hard. And there's not a lot of material out there that covers this.

Does it get better?

Hopefully. Recently, it was announced that the Rust Foundation received $1M from Google to support its C++/Rust interop initiative. That's great news for people interested in using Rust with C++, but it might result in Rust's FFI story improving in general. We'll see.

Then there are efforts to define a stable Rust ABI. That would allow us to express ourselves in a much more complete manner. One of them is crabi, which is currently just an RFC. Armanieu d'Antras gave a talk on the path to a stable Rust ABI, and takes inspiration from Swift.

Neither effort is usable today, however. And once they are, we'll still need to create support for them in other languages. But the glue-code-generating smart tools are out there today. They can make your life significantly better.

In the following posts, we'll cover per language which tools make interop with Rust better, and how you use them. The first language we'll tackle is C. And there's a lot to say about interop with that language. Follow our RSS feed to stay posted on the next update!

All code examples from the Rust Interop Guide can be found in this repo.

(our services)

Introducing Rust in your commercial project?

Get help from the experts!

reduce first-project risk
reduce time-to-market
train your team on the job

> Contact us

Henk

Embedded software engineer

development interop rust

Stay up-to-date

Stay up-to-date with our work and blog posts?

September 30, 2024

Rust interop in practice: speaking Python and Javascript

We've been writing how-tos about using Rust in existing C, Python, and C++ projects, but this article shows you an in-production example of Rust interoperability: Recently I worked on exposing the TSP Rust API to Python and NodeJS users.

development interop rust javascript python

Read article

September 17, 2024

Mix in Rust with C++

This article will help you to slowly introduce some Rust into your C++ project. We'll familiarize ourselves with the tooling and go through some examples.

development interop rust c++

Read article

August 27, 2024