September 17, 2024

Mix in Rust with C++

Henk

Embedded software engineer

This article will help you to slowly introduce some Rust into your C++ project. We'll familiarize ourselves with the tooling and go through some examples.

This article is part of our Rust Interop Guide.

So, the other day I read the following in a book about a certain programming language:

[...] is a general-purpose programming language emphasizing the design and use of type-rich, lightweight abstractions. It is particularly suited for resource-constrained applications, such as those found in software infrastructures. [...] rewards the programmer who takes the time to master techniques for writing quality code. [...] is a language for someone who takes the task of programming seriously. Our civilization depends critically on software; it had better be quality software.¹

In a post on a blog that features mosly Rust-centered articles, what's of course the name of the language I left out? That's right: C++. Maybe the part on taking programming seriously gave it away. Anyway.

It's not so strange that there's a lot of interest in Rust from the C++ space: both languages operate in the same niche. Both languages try to make systems-programming more scalable and less error-prone. It's no secret that Rust has taken a lot of inspiration from C++: the memory ordering model for atomics was shamelessly copied from C++20's, and the RAII idiom, which originated from C++, is used extensively in Rust.

Apart from just copying stuff, Rust aims to improve on the status quo in systems programming. The focus on correctness and memory safety has made many a C++ developer curious. But, as always in this blog series, the question is: what if you want to oxidize your C++ project? What if you want to slowly introduce Rust in your C++ code base?

In this post, I'll help you get going by going through some examples. I will, however, assume you've read the series introduction and the post on calling Rust from C. We'll start out with two rather simple projects, and end with a larger Qt-based project in which a Rust crate is used to handle image manipulation.

The tooling

As with other languages, tools exist to improve interoperability between Rust and C++, compared to hand-rolled C-ABIs. A major player in this department is cxx, written by, you guessed it: David Tolnay, the developer behind widely-used crates like cargo-expand, anyhow, and syn. cxx promises 'a safe mechanism for calling C++ code from Rust and Rust code from C++'. Nice.

cxx takes a different route to doing interop with C++ than PyO3 does with Python, and for good reason. Whereas PyO3 allows you to verbalize all kinds of Python concepts from Rust, cxx covers just those concepts that are present in both C++ and Rust. Those concepts can be translated between the languages with 'zero or negligible overhead', and low overhead is important in systems programming contexts. What's more, cxx is about 'carving out a highly expressive set of functionality about which we can make powerful safety guarantees today and extend over time'. So what you get is a safe and speedy FFI boundary.

There is a cost to this, however: you pay in expressiveness. It'll take some time to get used to having to massage your problems into the common subset cxx supports. Let's have a look!

Cooking some hash

We'll get started with a rather simple example: a C++ application that uses a Rust implementation for calculating a CRC32 hash for a given input. It'll read some bytes from stdin until a newline is encountered, and writes the CRC32 hash of the bytes to stdout as a hexadecimal number:

$ cat hello.txt | cxx_crc32fast
1cf81ca7

Let's start out by quickly setting up a Rust library that can do this:

$ cargo new cxx-crc32fast --bin && cd cxx-crc32fast
$ cargo add crc32fast@1.4

We'll leave the actual CRC32 calculation up to the crc32fast crate. Our crate will take care of exposing the functionality to C++. crc32fast exposes two APIs: the crc32fast::hash function, and the crc32fast::Hasher struct. The former is a very easy way to hash a bunch of bytes: simply pass it a slice and it'll get to work. The downside is that crc32fast::hash needs all the data at once. And we (or at least I) want our implementation to support hashing data in chunks. For that, we'll make use of crc32fast::Hasher. To make crc32fast::Hasher play nice with cxx, we'll have to wrap it and expose some of its functionality. Here's a first stab at it:

struct Hasher(crc32fast::Hasher);

impl Hasher {
    fn new() -> Self {
        Self(crc32fast::Hasher::new())
    }
    
    fn update(&mut self, buf: &[u8]) {
        self.0.update(&buf)
    }

    fn finalize(self) -> u32 {
        self.0.finalize()
    }
}

Great! We can now create an instance of our own Hasher, and call the update and finalize methods of the crc32fast::Hasher, simply by forwarding to the corresponding methods. Easy enough. Let's add some cxx sauce to our dish.

Water under the bridge

The cxx crate allows you to specify the way Rust functionality gets exposed to C++ and vice versa. You do this using the #[cxx::bridge] procedural macro, which you invoke on a module that defines what is exposed and how. Here's what it looks like:

#[cxx::bridge]
mod ffi {
    extern "Rust" {
        /* Stuff defined in Rust, exposed to C++ */
    }

    unsafe extern "C++" {
        /* Stuff defined in C++, exposed to Rust */
    }
}

In this post, we'll focus on the extern Rust block, in which we declare types and functions that are defined in Rust, and expose it to C++. That'll allow the C++ application to use it.

We would like to share the Hasher type with C++. cxx does support 'Shared types', which 'enable both languages to have visibility into the internals of a type'. Sounds good, let's give it a go.

First, we've got to add cxx as a dependency:

$ cargo add cxx@1

Then, set up the bridge, defining our Hasher as a Shared type:

#[cxx::bridge]
mod ffi {
    struct Hasher(crc32fast::Hasher);
}

And run a cargo check:

$ cargo check
    Checking cxx-crc32fast v0.1.0 (/home/hd/dev/tg/edu/cxx-crc32fast)
error: tuple structs are not supported

Ok, well, that might be easily fixable by simply using named fields:

#[cxx::bridge]
mod ffi {
    struct Hasher {
        hash: crc32fast::Hasher
    }
}

Try again:

$ cargo check
    Checking cxx-crc32fast v0.1.0 (/home/hd/dev/tg/edu/cxx-crc32fast)
error: unsupported type
 --> src/lib.rs:5:15
  |
5 |         hash: crc32fast::Hasher
  |               ^^^^^^^^^^^^^^^^^

Huh. So what type of fields does cxx support in shared structs? Well, for cxx to be able to validate the safe use of whatever goes over the FFI-boundary, it needs everything that is declared in a shared struct to be safe to pass. As crc32fast::Hasher is defined somewhere else, cxx can't check it. Shared types are not going to solve our problems here.

As an alternative to shared types, cxx supports using opaque types in the bridge. Opaque types hide their fields and are accesible only via pointer indirection. You declare them in the extern "Rust" block inside the #[cxx::bridge] module, and cxx will look for a matching definition in your Rust code. Let's try that with our Hasher tuple struct:

#[cxx::bridge]
mod ffi {
    extern "Rust" {
        type Hasher;
    }
}

struct Hasher(crc32fast::Hasher);

impl Hasher {
    /* Methods omitted */
}

And cargo check the thing! Apart from some warnings about unused things, Rust is fine with the code so far.

Down below I will look at what is actually happening under the hood, but let's press on.

impl Hasher

Okay, now let's expose the methods on Hasher as well. It's not possible to expose an associated function, so we'll add a static function that calls Hasher::new instead:

#[cxx::bridge]
mod ffi {
    extern "Rust" {
        type Hasher;
        
        fn init() -> Hasher;
    }
}

struct Hasher(crc32fast::Hasher);

fn init() -> Hasher {
    Hasher::new()
}

/* - snip - */

As you can see, we declare the FFI boundary in the #[cxx::bridge], but implement it in its parent module. Aaand check:

$ cargo check -q
error: returning opaque Rust type by value is not supported
[...]

Ah yeah, of course. Opaque types can only be exposed behind some pointer indirection, typically to a heap-allocated value. Let's update our code to reflect this:

#[cxx::bridge]
mod ffi {
    extern "Rust" {
        type Hasher;
        
        fn init() -> Box<Hasher>;
    }
}

struct Hasher(crc32fast::Hasher);

fn init() -> Box<Hasher> {
    Box::new(Hasher::new())
}

All right, that seems to compile again. One fewer warning than before, even. Let's now finish our bridge by exposing Hasher::update and Hasher::finish as well. This time, we're exposing methods instead of an associated function. Luckily, cxx does support exposing methods. Here's what the code looks like at this point:

#[cxx::bridge]
mod ffi {
    extern "Rust" {
        type Hasher;
        
        fn init() -> Box<Hasher>;
        
        fn update(&mut self, buf: &[u8]);
        
        fn finalize(self) -> u32;
    }
}

struct Hasher(crc32fast::Hasher);

fn init() -> Box<Hasher> {
    Box::new(Hasher::new())
}

impl Hasher {
    fn new() -> Self {
        Self(crc32fast::Hasher::new())
    }
    
    fn update(&mut self, buf: &[u8]) {
        self.0.update(&buf)
    }

    fn finalize(self) -> u32 {
        self.0.finalize()
    }
}

As the extern "Rust" block in the bridge contains a single type declaration, cxx will assume the self parameters refer to this type. In case there is more than one type declaration, cxx allows you to disambiguate by declaring the type of the self parameter:

fn update(self: &mut Hasher, buf: &[u8]);

Now, let's again check this code:

$ cargo check -q
error: unsupported method receiver
  --> src/lib.rs:10:21
   |
10 |         fn finalize(self) -> u32;
   |                     ^^^^

error: could not compile `cxx-crc32fast` (lib) due to 1 previous error

Hmm. Looks like consuming methods are not (yet) supported. And in this case, that makes sense: as we're exposing Hasher only as an opaque type to C++, there's no way for C++ to pass a Hasher to Hasher::finalize() by value. We'll have to work around this. One way to do it is by having Hasher::finalize take a reference to self, and using crc32fast::Hasher's Clone implementation to invoke finalize on a cloned Hasher:

#[cxx::bridge]
mod ffi {
    extern "Rust" {
        type Hasher;
        
        /* - snip - */
        
        fn finalize(&self) -> u32;
    }
}

struct Hasher(crc32fast::Hasher);

/* - snip - */

impl Hasher {

    /* - snip - */

    fn finalize(&self) -> u32 {
        self.0.clone().finalize()
    }
}

And cxx is totally happy! In this case, our Hasher is cheap to clone, so this is not even a bad workaround. But you won't always be this lucky. To support calling a consuming method, we'll again expose finalize as a static function, and then forward to the corresponding method on Hasher. As init returns a Box<Hasher>, it makes sense to allow C++ to pass that to finalize. Here's the result:

#[cxx::bridge]
mod ffi {
    extern "Rust" {
        type Hasher;
        
        fn init() -> Box<Hasher>;
        
        fn update(&mut self, buf: &[u8]);
        
        fn finalize(h: Box<Hasher>) -> u32;
    }
}

struct Hasher(crc32fast::Hasher);

fn init() -> Box<Hasher> {
    Box::new(Hasher::new())
}

fn finalize(h: Box<Hasher>) -> u32 {
    h.finalize()
}

impl Hasher {
    fn new() -> Self {
        Self(crc32fast::Hasher::new())
    }
    
    fn update(&mut self, buf: &[u8]) {
        self.0.update(&buf)
    }

    fn finalize(self) -> u32 {
        self.0.finalize()
    }
}

And now cargo check has nothing more to complain about!

Exposing this simple piece of code has been quite tricky. We've seen a bunch of errors we had to work around. This is the price you pay for safe and fast FFI when using C++. You need to really hammer your problem into the jig of cxx sometimes. But you do get back nice things.

The other side

Right, now that the Rust side of things seems to be complete, let's get some C++ going for us. First, let's configure our crate such that the C++ side of our bridge gets generated. For that, we'll add the cxx-build crate as a build dependency:

$ cargo add cxx-build@1 --build

Next up, let's create a build.rs in our crate root, in which we invoke cxx-build:

fn main() {
    cxx_build::bridge("src/lib.rs")
        .compile("cxx-crc32fast");

    println!("cargo:rerun-if-changed=src/lib.rs");
}

This will have cxx-build generate the C++ bindings to our bridge, and then compile the thing. After a cargo run, you can find the generated C++ in target/cxxbridge:

target/cxxbridge/
├── cxx-crc32fast
│   └── src
│       ├── lib.rs.cc -> ../../../debug/build/cxx-crc32fast-6147c3641f4e46e4/out/cxxbridge/sources/cxx-crc32fast/src/lib.rs.cc
│       └── lib.rs.h -> ../../../debug/build/cxx-crc32fast-6147c3641f4e46e4/out/cxxbridge/include/cxx-crc32fast/src/lib.rs.h
└── rust
    └── cxx.h -> /path/to/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cxx-1.0.124/include/cxx.h

A bunch of symlinks to headers and code. cxxbridge/rust/cxx.h contains headers corresponding to cxx built-in bindings. The more interesting stuff is in cxxbridge/cxx-crc32fast/src/lib.rs.*. This is the C++ translation of our cxx bridge. For instance, at the end of lib.rs.h, the Hasher is declared:

struct Hasher;

#ifndef CXXBRIDGE1_STRUCT_Hasher
#define CXXBRIDGE1_STRUCT_Hasher
struct Hasher final : public ::rust::Opaque {
  void update(::rust::Slice<::std::uint8_t const> buf) noexcept;
  ~Hasher() = delete;

private:
  friend ::rust::layout;
  struct layout {
    static ::std::size_t size() noexcept;
    static ::std::size_t align() noexcept;
  };
};
#endif // CXXBRIDGE1_STRUCT_Hasher

::rust::Box<::Hasher> init() noexcept;

::std::uint32_t finalize(::rust::Box<::Hasher> h) noexcept;

We can see the update method, and the init and finalize functions we declared in our bridge. Hasher even has a method named update, which takes a rust::Slice of bytes. Another interesting detail is that the Hasher class inherits from the ::rust::Opaque class, which is defined as follows:

class Opaque {
public:
  // Constructor
  Opaque() = delete;
  // Copy Contructor
  Opaque(const Opaque &) = delete;
  // Destructor
  ~Opaque() = delete;
};

As you can see, anything Opaque cannot be created nor destroyed on the C++-side of the bridge. Its allocation is managed by the Rust side instead. Furthermore, the implementation of init, finalize and Hasher::update are simply forwarded to the Rust side. Here's what that looks like:

::rust::Box<::Hasher> init() noexcept {
  return ::rust::Box<::Hasher>::from_raw(cxxbridge1$init());
}

void Hasher::update(::rust::Slice<::std::uint8_t const> buf) noexcept {
  cxxbridge1$Hasher$update(*this, buf);
}

::std::uint32_t finalize(::rust::Box<::Hasher> h) noexcept {
  return cxxbridge1$finalize(h.into_raw());
}

By the way

I thought it'd be cool to have a look at the expansion of the #[cxx::bridge] macro. With cargo-expand installed, run the following:

$ cargo expand ffi

Here's what it spits out for our Hasher struct without the impl block:

/* attrs omitted */
mod ffi {
    use super::Hasher;
    #[doc(hidden)]
    unsafe impl ::cxx::private::RustType for Hasher {}
    #[doc(hidden)]
    const _: () = {
        let _ = {
            fn __AssertUnpin<
                T: ?::cxx::core::marker::Sized + ::cxx::core::marker::Unpin,
            >() {}
            __AssertUnpin::<Hasher>
        };
        {
            #[doc(hidden)]
            #[allow(clippy::needless_maybe_sized)]
            fn __AssertSized<
                T: ?::cxx::core::marker::Sized + ::cxx::core::marker::Sized,
            >() -> ::cxx::core::alloc::Layout {
                ::cxx::core::alloc::Layout::new::<T>()
            }
            #[doc(hidden)]
            #[export_name = "cxxbridge1$Hasher$operator$sizeof"]
            extern "C" fn __sizeof_Hasher() -> usize {
                __AssertSized::<Hasher>().size()
            }
            #[doc(hidden)]
            #[export_name = "cxxbridge1$Hasher$operator$alignof"]
            extern "C" fn __alignof_Hasher() -> usize {
                __AssertSized::<Hasher>().align()
            }
        }
    };
}

Whoa. That looks completely magical. Let's go through it step-by-step. The stuff prefixed with ::cxx::core is referring to the core crate, which is a subset of std. So ::cxx::core::marker::Sized is equivalent to std::marker::Sized, simplifying stuff a bit.

With that out of the way, let's look at lines 8 to 13. What's been done here is a type system trick that validates whether Hasher implements std::marker::Unpin. This is done by declaring an empty generic function that is restricted to Ts implementing said traits and instantiating it with the Hasher as type argument. This will of course trivially be optimized out, but it only compiles if Hash is Unpin. Pretty nifty.

The fn __AssertSized sort of does the same thing, but it returns an alloc::Layout, and has rather strange bounds:

T: ?::cxx::core::marker::Sized + ::cxx::core::marker::Sized

I'm not 100% certain why one would first relax the bound on T to be Sized, and then immediately assert it again. After some digging around in the git history, it seems that this is a means of getting prettier compiler messages in case a type were declared that is not Sized. Please let me know if you know the actual reason!

Using the Layout that __AssertSized returns, the size and alignment of the Hasher is exposed to C++ using the __sizeof_Hasher and __alignof_Hasher extern "C" fns.

This information is used by cxx to validate the correct use of our Hasher type. Very cool.

The C++ functions prefixed with cxxbridge1$ from the previous section were generated by the #[cxx::bridge] macro in the Rust code. It's also fun to see what cargo expand does here:

#[doc(hidden)]
#[export_name = "cxxbridge1$init"]
unsafe extern "C" fn __init() -> *mut Hasher {
    let __fn = "cxx_crc32fast::ffi::init";
    fn __init() -> ::cxx::alloc::boxed::Box<Hasher> {
        super::init()
    }
    ::cxx::private::prevent_unwind(
        __fn,
        move || ::cxx::alloc::boxed::Box::into_raw(__init()),
    )
}
#[doc(hidden)]
#[export_name = "cxxbridge1$Hasher$update"]
unsafe extern "C" fn __Hasher__update(
    __self: &mut Hasher,
    buf: ::cxx::private::RustSlice,
) {
    let __fn = "cxx_crc32fast::ffi::Hasher::update";
    fn __Hasher__update(__self: &mut Hasher, buf: &[u8]) {
        Hasher::update(__self, buf)
    }
    ::cxx::private::prevent_unwind(
        __fn,
        move || unsafe { __Hasher__update(__self, buf.as_slice::<u8>()) },
    )
}
#[doc(hidden)]
#[export_name = "cxxbridge1$finalize"]
unsafe extern "C" fn __finalize(h: *mut Hasher) -> u32 {
    let __fn = "cxx_crc32fast::ffi::finalize";
    fn __finalize(h: ::cxx::alloc::boxed::Box<Hasher>) -> u32 {
        super::finalize(h)
    }
    ::cxx::private::prevent_unwind(
        __fn,
        move || unsafe { __finalize(::cxx::alloc::boxed::Box::from_raw(h)) },
    )
}
[...]

The calls to ::cxx::private::prevent_unwind, as its name suggests, prevent panics over the FFI boundary.

And with that, we've come full circle!

Stitching up

To check that our bridge works as intended, let's create a simple C++ application that reads a line from stdin, hashes it, and spits out the result. Create a new C++ source file src/crc32fast.cc with the following contents:

#include "cxx-crc32fast/include/crc32fast.h"
#include "cxx-crc32fast/src/lib.rs.h"
#include <iostream>
#include <iomanip>
#include <vector>

int main() {
    // Read input from stdin
    std::istreambuf_iterator<char> begin{std::cin}, end;
    std::vector<unsigned char> input{begin, end};
    rust::Slice<const uint8_t> slice{input.data(), input.size()}; // drop the linefeed

    // Hash it
    rust::Box<Hasher> h = init();
    h->update(slice);
    uint32_t output = finalize(std::move(h));

    // Write to stdout.
    std::cout << std::setw(8) << std::setfill('0') << std::hex << output << std::endl;
}

Nothing fancy: we read some bytes, initialize the Hasher, pass it the slice and finalize it, and print. We can instruct Cargo to invoke cxx-build and compile our C++ application by registering it in build.rs:

fn main() {
    cxx_build::bridge("src/lib.rs")
        .file("src/crc32fast.cc")
        .compile("cxx-crc32fast");

    println!("cargo:rerun-if-changed=src/lib.rs");
    println!("cargo:rerun-if-changed=src/crc32fast.cc");
    println!("cargo:rerun-if-changed=include/crc32fast.h");
}

Compiling this using cargo build will produce a .rlib file, which is not an executable. Let's trick Cargo into realizing that it needs to produce a binary, by adding the following to our Cargo.toml:

[lib]
crate-type = ["bin"]

The compiler will still be confused though, as it tries to find the main function in the Rust code, which isn't there. It's defined in the C++ code instead. To disable this behavior, add the #![no_main] attribute to the top of our src/lib.rs:

#![no_main]

The linker will figure out where to find the main function. Compile the thing and run:

$ cargo build
$ cat hello.txt | target/debug/cxx_crc32fast
01d7afb4

I admit we can improve the compilation config, but hey, it works! That's a success in my book!

Recap

So far we've seen a general overview of how cxx works. We've seen how cxx limits the way you're able to express your FFI boundary, and how it generates safe, cheap glue code in return. But we haven't seen a lot of its features yet. In a future article (schedules allowing...), we'd like to step up our game a bit and write a little JSON prettifier. After that, we're planning to tackle the build problem by looking at a bigger example, based on Qt. Stay tuned!

(our services)

Introducing Rust in your commercial project?

Get help from the experts!

reduce first-project risk
reduce time-to-market
train your team on the job

> Contact us

Henk

Embedded software engineer

development interop rust c++

Stay up-to-date

Stay up-to-date with our work and blog posts?

September 30, 2024

Rust interop in practice: speaking Python and Javascript

We've been writing how-tos about using Rust in existing C, Python, and C++ projects, but this article shows you an in-production example of Rust interoperability: Recently I worked on exposing the TSP Rust API to Python and NodeJS users.

development interop rust javascript python

Read article

August 27, 2024

Mix in Rust with Python: PyO3

In this article, we'll dive into combining Rust with Python. Specifically, this post covers calling modules written in Rust from Python scripts.

development interop rust python

Read article

July 15, 2024

Mix in C with Rust: A taste of C in your Rust

Can't wait to learn how to call C code from your Rust project after reading my previous posts about Rust interop in general and calling Rust from C? Good! If you haven't read those yet, please do, because I'm going to assume you have in this article. We've seen the basics of Rust's FFI, and have experimented with calling Rust from C. Going the other way around, you'll walk into much the same challenges.

development interop rust c

Read article

Mix in Rust with C++

The tooling

Cooking some hash

Water under the bridge

impl Hasher

The other side

By the way

Stitching up

Recap

Introducing Rust in your commercial project?

Stay up-to-date

Related articles

Rust interop in practice: speaking Python and Javascript

Mix in Rust with Python: PyO3

Mix in C with Rust: A taste of C in your Rust