Mix in Rust with C++
This article is part of our Rust Interop Guide.
So, the other day I read the following in a book about a certain programming language:
[...] is a general-purpose programming language emphasizing the design and use of type-rich, lightweight abstractions. It is particularly suited for resource-constrained applications, such as those found in software infrastructures. [...] rewards the programmer who takes the time to master techniques for writing quality code. [...] is a language for someone who takes the task of programming seriously. Our civilization depends critically on software; it had better be quality software.1
In a post on a blog that features mosly Rust-centered articles, what's of course the name of the language I left out? That's right: C++. Maybe the part on taking programming seriously gave it away. Anyway.
It's not so strange that there's a lot of interest in Rust from the C++ space: both languages operate in the same niche. Both languages try to make systems-programming more scalable and less error-prone. It's no secret that Rust has taken a lot of inspiration from C++: the memory ordering model for atomics was shamelessly copied from C++20's, and the RAII idiom, which originated from C++, is used extensively in Rust.
Apart from just copying stuff, Rust aims to improve on the status quo in systems programming. The focus on correctness and memory safety has made many a C++ developer curious. But, as always in this blog series, the question is: what if you want to oxidize your C++ project? What if you want to slowly introduce Rust in your C++ code base?
In this post, I'll help you get going by going through some examples. I will, however, assume you've read the series introduction and the post on calling Rust from C. We'll start out with two rather simple projects, and end with a larger Qt-based project in which a Rust crate is used to handle image manipulation.
The tooling
As with other languages, tools exist to improve interoperability between Rust and C++, compared to hand-rolled C-ABIs. A major player in this department is cxx, written by, you guessed it: David Tolnay, the developer behind widely-used crates like cargo-expand
, anyhow
, and syn
. cxx
promises 'a safe mechanism for calling C++ code from Rust and Rust code from C++'. Nice.
cxx
takes a different route to doing interop with C++ than PyO3 does with Python, and for good reason. Whereas PyO3 allows you to verbalize all kinds of Python concepts from Rust, cxx
covers just those concepts that are present in both C++ and Rust. Those concepts can be translated between the languages with 'zero or negligible overhead', and low overhead is important in systems programming contexts. What's more, cxx
is about 'carving out a highly expressive set of functionality about which we can make powerful safety guarantees today and extend over time'. So what you get is a safe and speedy FFI boundary.
There is a cost to this, however: you pay in expressiveness. It'll take some time to get used to having to massage your problems into the common subset cxx
supports. Let's have a look!
Cooking some hash
We'll get started with a rather simple example: a C++ application that uses a Rust implementation for calculating a CRC32 hash for a given input. It'll read some bytes from stdin until a newline is encountered, and writes the CRC32 hash of the bytes to stdout as a hexadecimal number:
$ cat hello.txt | cxx_crc32fast
1cf81ca7
Let's start out by quickly setting up a Rust library that can do this:
$ cargo new cxx-crc32fast --bin && cd cxx-crc32fast
$ cargo add crc32fast@1.4
We'll leave the actual CRC32 calculation up to the crc32fast
crate. Our crate will take care of exposing the functionality to C++. crc32fast
exposes two APIs: the crc32fast::hash
function, and the crc32fast::Hasher
struct. The former is a very easy way to hash a bunch of bytes: simply pass it a slice and it'll get to work. The downside is that crc32fast::hash
needs all the data at once. And we (or at least I) want our implementation to support hashing data in chunks. For that, we'll make use of crc32fast::Hasher
. To make crc32fast::Hasher
play nice with cxx
, we'll have to wrap it and expose some of its functionality. Here's a first stab at it:
struct Hasher(crc32fast::Hasher);
impl Hasher {
fn new() -> Self {
Self(crc32fast::Hasher::new())
}
fn update(&mut self, buf: &[u8]) {
self.0.update(&buf)
}
fn finalize(self) -> u32 {
self.0.finalize()
}
}
Great! We can now create an instance of our own Hasher
, and call the update
and finalize
methods of the crc32fast::Hasher
, simply by forwarding to the corresponding methods. Easy enough. Let's add some cxx
sauce to our dish.
Water under the bridge
The cxx
crate allows you to specify the way Rust functionality gets exposed to C++ and vice versa. You do this using the #[cxx::bridge]
procedural macro, which you invoke on a module that defines what is exposed and how. Here's what it looks like:
#[cxx::bridge]
mod ffi {
extern "Rust" {
/* Stuff defined in Rust, exposed to C++ */
}
unsafe extern "C++" {
/* Stuff defined in C++, exposed to Rust */
}
}
In this post, we'll focus on the extern Rust
block, in which we declare types and functions that are defined in Rust, and expose it to C++. That'll allow the C++ application to use it.
We would like to share the Hasher
type with C++. cxx
does support 'Shared types', which 'enable both languages to have visibility into the internals of a type'. Sounds good, let's give it a go.
First, we've got to add cxx
as a dependency:
$ cargo add cxx@1
Then, set up the bridge, defining our Hasher
as a Shared type:
#[cxx::bridge]
mod ffi {
struct Hasher(crc32fast::Hasher);
}
And run a cargo check
:
$ cargo check
Checking cxx-crc32fast v0.1.0 (/home/hd/dev/tg/edu/cxx-crc32fast)
error: tuple structs are not supported
Ok, well, that might be easily fixable by simply using named fields:
#[cxx::bridge]
mod ffi {
struct Hasher {
hash: crc32fast::Hasher
}
}
Try again:
$ cargo check
Checking cxx-crc32fast v0.1.0 (/home/hd/dev/tg/edu/cxx-crc32fast)
error: unsupported type
--> src/lib.rs:5:15
|
5 | hash: crc32fast::Hasher
| ^^^^^^^^^^^^^^^^^
Huh. So what type of fields does cxx
support in shared structs? Well, for cxx
to be able to validate the safe use of whatever goes over the FFI-boundary, it needs everything that is declared in a shared struct to be safe to pass. As crc32fast::Hasher
is defined somewhere else, cxx
can't check it. Shared types are not going to solve our problems here.
As an alternative to shared types, cxx
supports using opaque types in the bridge. Opaque types hide their fields and are accesible only via pointer indirection. You declare them in the extern "Rust"
block inside the #[cxx::bridge]
module, and cxx
will look for a matching definition in your Rust code. Let's try that with our Hasher
tuple struct:
#[cxx::bridge]
mod ffi {
extern "Rust" {
type Hasher;
}
}
struct Hasher(crc32fast::Hasher);
impl Hasher {
/* Methods omitted */
}
And cargo check
the thing! Apart from some warnings about unused things, Rust is fine with the code so far.
Down below I will look at what is actually happening under the hood, but let's press on.
impl Hasher
Okay, now let's expose the methods on Hasher
as well. It's not possible to expose an associated function, so we'll add a static function that calls Hasher::new
instead:
#[cxx::bridge]
mod ffi {
extern "Rust" {
type Hasher;
fn init() -> Hasher;
}
}
struct Hasher(crc32fast::Hasher);
fn init() -> Hasher {
Hasher::new()
}
/* - snip - */
As you can see, we declare the FFI boundary in the #[cxx::bridge]
, but implement it in its parent module. Aaand check:
$ cargo check -q
error: returning opaque Rust type by value is not supported
[...]
Ah yeah, of course. Opaque types can only be exposed behind some pointer indirection, typically to a heap-allocated value. Let's update our code to reflect this:
#[cxx::bridge]
mod ffi {
extern "Rust" {
type Hasher;
fn init() -> Box<Hasher>;
}
}
struct Hasher(crc32fast::Hasher);
fn init() -> Box<Hasher> {
Box::new(Hasher::new())
}
All right, that seems to compile again. One fewer warning than before, even. Let's now finish our bridge by exposing Hasher::update
and Hasher::finish
as well. This time, we're exposing methods instead of an associated function. Luckily, cxx
does support exposing methods. Here's what the code looks like at this point:
#[cxx::bridge]
mod ffi {
extern "Rust" {
type Hasher;
fn init() -> Box<Hasher>;
fn update(&mut self, buf: &[u8]);
fn finalize(self) -> u32;
}
}
struct Hasher(crc32fast::Hasher);
fn init() -> Box<Hasher> {
Box::new(Hasher::new())
}
impl Hasher {
fn new() -> Self {
Self(crc32fast::Hasher::new())
}
fn update(&mut self, buf: &[u8]) {
self.0.update(&buf)
}
fn finalize(self) -> u32 {
self.0.finalize()
}
}
As the extern "Rust"
block in the bridge contains a single type
declaration, cxx
will assume the self
parameters refer to this type. In case there is more than one type
declaration, cxx
allows you to disambiguate by declaring the type of the self
parameter:
fn update(self: &mut Hasher, buf: &[u8]);
Now, let's again check this code:
$ cargo check -q
error: unsupported method receiver
--> src/lib.rs:10:21
|
10 | fn finalize(self) -> u32;
| ^^^^
error: could not compile `cxx-crc32fast` (lib) due to 1 previous error
Hmm. Looks like consuming methods are not (yet) supported. And in this case, that makes sense: as we're exposing Hasher
only as an opaque type to C++, there's no way for C++ to pass a Hasher
to Hasher::finalize()
by value. We'll have to work around this. One way to do it is by having Hasher::finalize
take a reference to self
, and using crc32fast::Hasher
's Clone
implementation to invoke finalize
on a cloned Hasher:
#[cxx::bridge]
mod ffi {
extern "Rust" {
type Hasher;
/* - snip - */
fn finalize(&self) -> u32;
}
}
struct Hasher(crc32fast::Hasher);
/* - snip - */
impl Hasher {
/* - snip - */
fn finalize(&self) -> u32 {
self.0.clone().finalize()
}
}
And cxx
is totally happy! In this case, our Hasher
is cheap to clone, so this is not even a bad workaround. But you won't always be this lucky. To support calling a consuming method, we'll again expose finalize
as a static function, and then forward to the corresponding method on Hasher
. As init
returns a Box<Hasher>
, it makes sense to allow C++ to pass that to finalize
. Here's the result:
#[cxx::bridge]
mod ffi {
extern "Rust" {
type Hasher;
fn init() -> Box<Hasher>;
fn update(&mut self, buf: &[u8]);
fn finalize(h: Box<Hasher>) -> u32;
}
}
struct Hasher(crc32fast::Hasher);
fn init() -> Box<Hasher> {
Box::new(Hasher::new())
}
fn finalize(h: Box<Hasher>) -> u32 {
h.finalize()
}
impl Hasher {
fn new() -> Self {
Self(crc32fast::Hasher::new())
}
fn update(&mut self, buf: &[u8]) {
self.0.update(&buf)
}
fn finalize(self) -> u32 {
self.0.finalize()
}
}
And now cargo check
has nothing more to complain about!
Exposing this simple piece of code has been quite tricky. We've seen a bunch of errors we had to work around. This is the price you pay for safe and fast FFI when using C++. You need to really hammer your problem into the jig of cxx
sometimes. But you do get back nice things.
The other side
Right, now that the Rust side of things seems to be complete, let's get some C++ going for us. First, let's configure our crate such that the C++ side of our bridge gets generated. For that, we'll add the cxx-build
crate as a build dependency:
$ cargo add cxx-build@1 --build
Next up, let's create a build.rs
in our crate root, in which we invoke cxx-build
:
fn main() {
cxx_build::bridge("src/lib.rs")
.compile("cxx-crc32fast");
println!("cargo:rerun-if-changed=src/lib.rs");
}
This will have cxx-build
generate the C++ bindings to our bridge, and then compile the thing. After a cargo run
, you can find the generated C++ in target/cxxbridge
:
target/cxxbridge/
├── cxx-crc32fast
│ └── src
│ ├── lib.rs.cc -> ../../../debug/build/cxx-crc32fast-6147c3641f4e46e4/out/cxxbridge/sources/cxx-crc32fast/src/lib.rs.cc
│ └── lib.rs.h -> ../../../debug/build/cxx-crc32fast-6147c3641f4e46e4/out/cxxbridge/include/cxx-crc32fast/src/lib.rs.h
└── rust
└── cxx.h -> /path/to/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cxx-1.0.124/include/cxx.h
A bunch of symlinks to headers and code. cxxbridge/rust/cxx.h
contains headers corresponding to cxx
built-in bindings. The more interesting stuff is in cxxbridge/cxx-crc32fast/src/lib.rs.*
. This is the C++ translation of our cxx
bridge. For instance, at the end of lib.rs.h
, the Hasher
is declared:
struct Hasher;
#ifndef CXXBRIDGE1_STRUCT_Hasher
#define CXXBRIDGE1_STRUCT_Hasher
struct Hasher final : public ::rust::Opaque {
void update(::rust::Slice<::std::uint8_t const> buf) noexcept;
~Hasher() = delete;
private:
friend ::rust::layout;
struct layout {
static ::std::size_t size() noexcept;
static ::std::size_t align() noexcept;
};
};
#endif // CXXBRIDGE1_STRUCT_Hasher
::rust::Box<::Hasher> init() noexcept;
::std::uint32_t finalize(::rust::Box<::Hasher> h) noexcept;
We can see the update
method, and the init
and finalize
functions we declared in our bridge. Hasher
even has a method named update
, which takes a rust::Slice
of bytes. Another interesting detail is that the Hasher
class inherits from the ::rust::Opaque
class, which is defined as follows:
class Opaque {
public:
// Constructor
Opaque() = delete;
// Copy Contructor
Opaque(const Opaque &) = delete;
// Destructor
~Opaque() = delete;
};
As you can see, anything Opaque
cannot be created nor destroyed on the C++-side of the bridge. Its allocation is managed by the Rust side instead. Furthermore, the implementation of init
, finalize
and Hasher::update
are simply forwarded to the Rust side. Here's what that looks like:
::rust::Box<::Hasher> init() noexcept {
return ::rust::Box<::Hasher>::from_raw(cxxbridge1$init());
}
void Hasher::update(::rust::Slice<::std::uint8_t const> buf) noexcept {
cxxbridge1$Hasher$update(*this, buf);
}
::std::uint32_t finalize(::rust::Box<::Hasher> h) noexcept {
return cxxbridge1$finalize(h.into_raw());
}
By the way
I thought it'd be cool to have a look at the expansion of the #[cxx::bridge]
macro. With cargo-expand
installed, run the following:
$ cargo expand ffi
Here's what it spits out for our Hasher
struct without the impl
block:
/* attrs omitted */
mod ffi {
use super::Hasher;
#[doc(hidden)]
unsafe impl ::cxx::private::RustType for Hasher {}
#[doc(hidden)]
const _: () = {
let _ = {
fn __AssertUnpin<
T: ?::cxx::core::marker::Sized + ::cxx::core::marker::Unpin,
>() {}
__AssertUnpin::<Hasher>
};
{
#[doc(hidden)]
#[allow(clippy::needless_maybe_sized)]
fn __AssertSized<
T: ?::cxx::core::marker::Sized + ::cxx::core::marker::Sized,
>() -> ::cxx::core::alloc::Layout {
::cxx::core::alloc::Layout::new::<T>()
}
#[doc(hidden)]
#[export_name = "cxxbridge1$Hasher$operator$sizeof"]
extern "C" fn __sizeof_Hasher() -> usize {
__AssertSized::<Hasher>().size()
}
#[doc(hidden)]
#[export_name = "cxxbridge1$Hasher$operator$alignof"]
extern "C" fn __alignof_Hasher() -> usize {
__AssertSized::<Hasher>().align()
}
}
};
}
Whoa. That looks completely magical. Let's go through it step-by-step. The stuff prefixed with ::cxx::core
is referring to the core
crate, which is a subset of std
. So ::cxx::core::marker::Sized
is equivalent to std::marker::Sized
, simplifying stuff a bit.
With that out of the way, let's look at lines 8 to 13. What's been done here is a type system trick that validates whether Hasher
implements std::marker::Unpin
. This is done by declaring an empty generic function that is restricted to T
s implementing said traits and instantiating it with the Hasher
as type argument. This will of course trivially be optimized out, but it only compiles if Hash
is Unpin
. Pretty nifty.
The fn __AssertSized
sort of does the same thing, but it returns an alloc::Layout
, and has rather strange bounds:
T: ?::cxx::core::marker::Sized + ::cxx::core::marker::Sized
I'm not 100% certain why one would first relax the bound on T to be Sized
, and then immediately assert it again. After some digging around in the git history, it seems that this is a means of getting prettier compiler messages in case a type were declared that is not Sized
. Please let me know if you know the actual reason!
Using the Layout
that __AssertSized
returns, the size and alignment of the Hasher
is exposed to C++ using the __sizeof_Hasher
and __alignof_Hasher
extern "C" fn
s.
This information is used by cxx
to validate the correct use of our Hasher
type. Very cool.
The C++ functions prefixed with cxxbridge1$
from the previous section were generated by the #[cxx::bridge]
macro in the Rust code. It's also fun to see what cargo expand
does here:
#[doc(hidden)]
#[export_name = "cxxbridge1$init"]
unsafe extern "C" fn __init() -> *mut Hasher {
let __fn = "cxx_crc32fast::ffi::init";
fn __init() -> ::cxx::alloc::boxed::Box<Hasher> {
super::init()
}
::cxx::private::prevent_unwind(
__fn,
move || ::cxx::alloc::boxed::Box::into_raw(__init()),
)
}
#[doc(hidden)]
#[export_name = "cxxbridge1$Hasher$update"]
unsafe extern "C" fn __Hasher__update(
__self: &mut Hasher,
buf: ::cxx::private::RustSlice,
) {
let __fn = "cxx_crc32fast::ffi::Hasher::update";
fn __Hasher__update(__self: &mut Hasher, buf: &[u8]) {
Hasher::update(__self, buf)
}
::cxx::private::prevent_unwind(
__fn,
move || unsafe { __Hasher__update(__self, buf.as_slice::<u8>()) },
)
}
#[doc(hidden)]
#[export_name = "cxxbridge1$finalize"]
unsafe extern "C" fn __finalize(h: *mut Hasher) -> u32 {
let __fn = "cxx_crc32fast::ffi::finalize";
fn __finalize(h: ::cxx::alloc::boxed::Box<Hasher>) -> u32 {
super::finalize(h)
}
::cxx::private::prevent_unwind(
__fn,
move || unsafe { __finalize(::cxx::alloc::boxed::Box::from_raw(h)) },
)
}
[...]
The calls to ::cxx::private::prevent_unwind
, as its name suggests, prevent panics over the FFI boundary.
And with that, we've come full circle!
Stitching up
To check that our bridge works as intended, let's create a simple C++ application that reads a line from stdin, hashes it, and spits out the result. Create a new C++ source file src/crc32fast.cc
with the following contents:
#include "cxx-crc32fast/include/crc32fast.h"
#include "cxx-crc32fast/src/lib.rs.h"
#include <iostream>
#include <iomanip>
#include <vector>
int main() {
// Read input from stdin
std::istreambuf_iterator<char> begin{std::cin}, end;
std::vector<unsigned char> input{begin, end};
rust::Slice<const uint8_t> slice{input.data(), input.size()}; // drop the linefeed
// Hash it
rust::Box<Hasher> h = init();
h->update(slice);
uint32_t output = finalize(std::move(h));
// Write to stdout.
std::cout << std::setw(8) << std::setfill('0') << std::hex << output << std::endl;
}
Nothing fancy: we read some bytes, initialize the Hasher
, pass it the slice and finalize it, and print. We can instruct Cargo to invoke cxx-build
and compile our C++ application by registering it in build.rs
:
fn main() {
cxx_build::bridge("src/lib.rs")
.file("src/crc32fast.cc")
.compile("cxx-crc32fast");
println!("cargo:rerun-if-changed=src/lib.rs");
println!("cargo:rerun-if-changed=src/crc32fast.cc");
println!("cargo:rerun-if-changed=include/crc32fast.h");
}
Compiling this using cargo build
will produce a .rlib
file, which is not an executable. Let's trick Cargo into realizing that it needs to produce a binary, by adding the following to our Cargo.toml
:
[lib]
crate-type = ["bin"]
The compiler will still be confused though, as it tries to find the main
function in the Rust code, which isn't there. It's defined in the C++ code instead. To disable this behavior, add the #![no_main]
attribute to the top of our src/lib.rs
:
#![no_main]
The linker will figure out where to find the main
function. Compile the thing and run:
$ cargo build
$ cat hello.txt | target/debug/cxx_crc32fast
01d7afb4
I admit we can improve the compilation config, but hey, it works! That's a success in my book!
Recap
So far we've seen a general overview of how cxx
works. We've seen how cxx
limits the way you're able to express your FFI boundary, and how it generates safe, cheap glue code in return. But we haven't seen a lot of its features yet. In a future article (schedules allowing...), we'd like to step up our game a bit and write a little JSON prettifier. After that, we're planning to tackle the build problem by looking at a bigger example, based on Qt. Stay tuned!
Introducing Rust in your commercial project?
Get help from the experts!
- reduce first-project risk
- reduce time-to-market
- train your team on the job