Mix in Rust with C

Henk
Embedded software engineer
Mix in Rust with C
So, you've just read my previous post on Rust interoperability in general, and now you're curious about how to actually apply the concepts to your situation. You've come to the right place, because in this post and the two that follow, I'll demonstrate how to make Rust and C talk to each other.

This article is part of our Rust Interop Guide.

If you're interested in having Rust talk with another language than C, it's good to know that concepts and tools introduced in this post form the basis of interfacing with almost every other programming language.

C is a great language. Originating as the language of the UNIX operating system, it has kept to its spirit of being a small language, that is 'unrestrictive and effective for many tasks'.1 In many universities, it is the first language students are taught, as its thin abstractions map well to the way even modern computers operate. C is everywhere: it forms the basis of every modern operating system, and can be compiled for exotic architectures. It's a true systems programming language.

It's also a very hard language. By that, I mean that doing C right takes a lot of experience, code review, documentation, tooling. And that does not combine very well with one aspect of software development: it's done by humans. Humans make mistakes. Even if they think they don't, surely they'll agree their colleagues make mistakes. Some of those result in hard to debug Undefined Behaviour, which Rust aims to reduce and make users aware of. Which, of course, is why you want to introduce Rust in your project, or already have.

So in what kind of cases is mixing Rust and C a good idea? The first one is obvious: you want to take a big, complex project to the present day by embedding Rust in your C. Other cases are when you're doing system calls to your OS directly, or when linking closed-source libraries. Vendors can simply supply C header files that specify the library API, and supply the implementation readily-compiled. Now you need to embed C in your Rust.

In my previous post, I outlined how Rust's FFI system works, and how you could go about creating an interface by hand. Now it's time to throw some tooling at the problem, to make our lives way better. In this article, we'll have a look at how to use a Rust library from a C project; I'll show some tools with which you can introduce Rust in your C project; And we'll have a look at generating bindings using cbindgen. I've also been fiddling with Diplomat, a tool that you can use to generate glue code, but we'll discuss Diplomat in the next article. If you're interested in embedding C libraries in your Rust project, you'll want to read my post on embedding C in Rust after having read this one.

Embedding Rust in your C

To get some Rust code working for you in your C project, one option is to use cbindgen. It's a tool by Mozilla that 'creates C/C++11 headers for Rust libraries which expose a public C API'. So it takes your Rust code, finds any extern fns, statics, and consts, and generates header files that correspond to those.

Here's how it works. Let's say we're creating a Rust library that is to be used by a C application. We run a cargo new --lib rust-in-c to generate a Rust library with the name rust-in-c. That part shouldn't be so surprising. Rust libraries, when compiled using the default settings, cannot be stiched together by a linker however, as their format is again not stable. They're some kind of intermediate compilation artifact that Rust uses to create static system libraries and binaries, which are linkable.

We need to instruct the compiler to build us a system library, which a linker can take and use symbols from. These systems libraries come in two flavours: dynamic and static. A dynamic library can be referred to from some executable, but is not included in it. The operating system will search for any symbols the executable refers to when starting your program, meaning that the dynamic library artifact will need to be present when running the application. The upside is that a dynamic library that is being used from many different executables, needs to be present on the system only once, saving space, and making patching easier. However, Rust defaults to static linking: linking at compile time, yielding a single, stand-alone executable. To support static linking when mixing languages, Rust can output a static system library, which can then be used by some other linker to build the executable. This is the recommended way to include Rust code into your C projects.

What's more, we'll be relying on the artifact to have a specific name, so we need to state it explicitly. The way we instruct Rust to use a specific name and to build both a dynamic and a static system library is by adding the following lines to our Cargo.toml:

[lib]
name = "rust_in_c"
crate-type = ["cdylib", "staticlib"]

You can omit either "cdylib" or "staticlib", if you need just one of them. Now, when we run cargo build, the artifacts can be found in target/debug (or target/release, if you've done a release build):

$ tree target/debug -L 1
target/debug
├── build
├── deps
├── examples
├── incremental
├── librust_in_c.a     <-- static library (archive)
├── librust_in_c.d
└── librust_in_c.so    <-- dynamic library (shared object)

I'm on Linux, so I get a .so file for a dynamic library, but if you're running Windows you'll get a .dll file, and on MacOS, a .dylib file is created. The static library is a .a file on Linux, MacOS and MinGW on Windows, or a .lib file when using MSVC on Windows.

Hello, Rust!

Cool! So now we've got a pretty useless set of artifacts! Let's add some Rust code that we will be calling from C. Let's start out simple, to make sure all moving parts are operating correctly. In src/lib.rs, add the following code:

#[no_mangle]
pub extern "C" fn say_hello() {
    println!("🦀 Hello, Rusty world! 🦀");
}

That just defines a simple function that takes no parameters and returns naught, and prints something to stdout. Now cbindgen comes in. First, add cbindgen as a build dependency by running cargo add --build cbindgen, then create a build.rs file in the same folder as Cargo.toml, and add the following content:

use std::env;

fn main() {
    let crate_dir = env::var("CARGO_MANIFEST_DIR").unwrap();

    std::fs::remove_dir_all("./bindings").ok();
    std::fs::create_dir_all("./bindings").unwrap();

    // Invoke cbindgen
    cbindgen::Builder::new()
        .with_crate(crate_dir)
        .with_language(cbindgen::Language::C)
        .generate()
        .unwrap()
        .write_to_file("bindings/rust-in-c.h");
}

This build script invokes cbindgen when building our library crate, which in turn will analyze our code and output a compatible C header file at bindings/rust-in-c.h. Its strategy is beautiful in its simplicity: cbindgen just walks your code, looking for #[no_mangle] pub extern fn (functions), #[no_mangle] pub static (globals), and pub const (constants), and then generates headers declaring those items. The generated header file for our library looks like this:

#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdlib.h>

void say_hello(void);

Looks good! Now how do we use it? The next step is to create a simple C program in a file called main.c:

#include "bindings/rust-in-c.h"

int main()
{
    say_hello();

    return 0;
}

Compile the thing:

# Compile with dynamic lib
clang target/debug/librust_in_c.so main.c -o rust-in-c-dynamic

# Compile statically
clang main.c target/debug/librust_in_c.a -o rust-in-c-static

And now run either of them:

$ ./rust-in-c-static
🦀 Hello, Rusty world! 🦀
$ ./rust-in-c-dynamic
🦀 Hello, Rusty world! 🦀

On Windows, you need to make sure the .dll file resides in the same folder as the rust-in-c-dynamic executable, or you'll get an error caused by the OS not being able to find the dynamic library being referred to from the executable.

Doing something useful

All right! It works! Apart from some theory and configuration, this seems to Just Work&trade;. But that was a very simple example. Let's try something a bit more elaborate. Create a new Rust module in src/crc32.rs, with the following code in it:

fn crc32_rust(data: &[u8]) -> u32 {
    let mut crc32 = 0xFFFFFFFFu32;

    for byte in data {
        let lookup_index = (crc32 ^ *byte as u32) & 0xff;
        crc32 = (crc32 >> 8) ^ CRC32_TABLE[lookup_index as usize]; // CRCTable is an array of 256 32-bit constants
    }

    // Finalize the CRC-32 value by inverting all the bits
    crc32 ^ 0xFFFFFFFFu32
}

const CRC32_TABLE: [u32; 256] = [
    0x00000000, 0x77073096, 0xEE0E612C, 0x990951BA, 0x076DC419, 0x706AF48F, 0xE963A535, 0x9E6495A3,
    0x0EDB8832, 0x79DCB8A4, 0xE0D5E91E, 0x97D2D988, 0x09B64C2B, 0x7EB17CBD, 0xE7B82D07, 0x90BF1D91,
    0x1DB71064, 0x6AB020F2, 0xF3B97148, 0x84BE41DE, 0x1ADAD47D, 0x6DDDE4EB, 0xF4D4B551, 0x83D385C7,
    0x136C9856, 0x646BA8C0, 0xFD62F97A, 0x8A65C9EC, 0x14015C4F, 0x63066CD9, 0xFA0F3D63, 0x8D080DF5,
    0x3B6E20C8, 0x4C69105E, 0xD56041E4, 0xA2677172, 0x3C03E4D1, 0x4B04D447, 0xD20D85FD, 0xA50AB56B,
    0x35B5A8FA, 0x42B2986C, 0xDBBBC9D6, 0xACBCF940, 0x32D86CE3, 0x45DF5C75, 0xDCD60DCF, 0xABD13D59,
    0x26D930AC, 0x51DE003A, 0xC8D75180, 0xBFD06116, 0x21B4F4B5, 0x56B3C423, 0xCFBA9599, 0xB8BDA50F,
    0x2802B89E, 0x5F058808, 0xC60CD9B2, 0xB10BE924, 0x2F6F7C87, 0x58684C11, 0xC1611DAB, 0xB6662D3D,
    0x76DC4190, 0x01DB7106, 0x98D220BC, 0xEFD5102A, 0x71B18589, 0x06B6B51F, 0x9FBFE4A5, 0xE8B8D433,
    0x7807C9A2, 0x0F00F934, 0x9609A88E, 0xE10E9818, 0x7F6A0DBB, 0x086D3D2D, 0x91646C97, 0xE6635C01,
    0x6B6B51F4, 0x1C6C6162, 0x856530D8, 0xF262004E, 0x6C0695ED, 0x1B01A57B, 0x8208F4C1, 0xF50FC457,
    0x65B0D9C6, 0x12B7E950, 0x8BBEB8EA, 0xFCB9887C, 0x62DD1DDF, 0x15DA2D49, 0x8CD37CF3, 0xFBD44C65,
    0x4DB26158, 0x3AB551CE, 0xA3BC0074, 0xD4BB30E2, 0x4ADFA541, 0x3DD895D7, 0xA4D1C46D, 0xD3D6F4FB,
    0x4369E96A, 0x346ED9FC, 0xAD678846, 0xDA60B8D0, 0x44042D73, 0x33031DE5, 0xAA0A4C5F, 0xDD0D7CC9,
    0x5005713C, 0x270241AA, 0xBE0B1010, 0xC90C2086, 0x5768B525, 0x206F85B3, 0xB966D409, 0xCE61E49F,
    0x5EDEF90E, 0x29D9C998, 0xB0D09822, 0xC7D7A8B4, 0x59B33D17, 0x2EB40D81, 0xB7BD5C3B, 0xC0BA6CAD,
    0xEDB88320, 0x9ABFB3B6, 0x03B6E20C, 0x74B1D29A, 0xEAD54739, 0x9DD277AF, 0x04DB2615, 0x73DC1683,
    0xE3630B12, 0x94643B84, 0x0D6D6A3E, 0x7A6A5AA8, 0xE40ECF0B, 0x9309FF9D, 0x0A00AE27, 0x7D079EB1,
    0xF00F9344, 0x8708A3D2, 0x1E01F268, 0x6906C2FE, 0xF762575D, 0x806567CB, 0x196C3671, 0x6E6B06E7,
    0xFED41B76, 0x89D32BE0, 0x10DA7A5A, 0x67DD4ACC, 0xF9B9DF6F, 0x8EBEEFF9, 0x17B7BE43, 0x60B08ED5,
    0xD6D6A3E8, 0xA1D1937E, 0x38D8C2C4, 0x4FDFF252, 0xD1BB67F1, 0xA6BC5767, 0x3FB506DD, 0x48B2364B,
    0xD80D2BDA, 0xAF0A1B4C, 0x36034AF6, 0x41047A60, 0xDF60EFC3, 0xA867DF55, 0x316E8EEF, 0x4669BE79,
    0xCB61B38C, 0xBC66831A, 0x256FD2A0, 0x5268E236, 0xCC0C7795, 0xBB0B4703, 0x220216B9, 0x5505262F,
    0xC5BA3BBE, 0xB2BD0B28, 0x2BB45A92, 0x5CB36A04, 0xC2D7FFA7, 0xB5D0CF31, 0x2CD99E8B, 0x5BDEAE1D,
    0x9B64C2B0, 0xEC63F226, 0x756AA39C, 0x026D930A, 0x9C0906A9, 0xEB0E363F, 0x72076785, 0x05005713,
    0x95BF4A82, 0xE2B87A14, 0x7BB12BAE, 0x0CB61B38, 0x92D28E9B, 0xE5D5BE0D, 0x7CDCEFB7, 0x0BDBDF21,
    0x86D3D2D4, 0xF1D4E242, 0x68DDB3F8, 0x1FDA836E, 0x81BE16CD, 0xF6B9265B, 0x6FB077E1, 0x18B74777,
    0x88085AE6, 0xFF0F6A70, 0x66063BCA, 0x11010B5C, 0x8F659EFF, 0xF862AE69, 0x616BFFD3, 0x166CCF45,
    0xA00AE278, 0xD70DD2EE, 0x4E048354, 0x3903B3C2, 0xA7672661, 0xD06016F7, 0x4969474D, 0x3E6E77DB,
    0xAED16A4A, 0xD9D65ADC, 0x40DF0B66, 0x37D83BF0, 0xA9BCAE53, 0xDEBB9EC5, 0x47B2CF7F, 0x30B5FFE9,
    0xBDBDF21C, 0xCABAC28A, 0x53B39330, 0x24B4A3A6, 0xBAD03605, 0xCDD70693, 0x54DE5729, 0x23D967BF,
    0xB3667A2E, 0xC4614AB8, 0x5D681B02, 0x2A6F2B94, 0xB40BBE37, 0xC30C8EA1, 0x5A05DF1B, 0x2D02EF8D,
];

And don't forget to add "mod crc32;" to the top of your src/lib.rs file. The crc32 module defines a single function, crc32_rust and a private constant, CRC32_TABLE. The crc32_rust function takes a byte slice, calculates its CRC32 using CRC32_TABLE, and returns the result as a u32. Definitely something we want to use from C! Let's create our exern "C" fn to expose crc32_rust to the outside world:

/// Calculate CRC32 for passed data. If len is non-zero, data must point to a valid slice in memory of length len.
///
/// # Safety
/// This function uses [std::slice::from_raw_parts] to create a slice
/// out of the passed raw pointer and length, and
/// this function exhibits Undefined Behavior in the same cases as
/// `from_raw_parts`
#[no_mangle]
pub unsafe extern "C" fn crc32(data: *const u8, len: usize) -> u32 {
    let slice = if len == 0 {
        &[]
    } else {
        std::slice::from_raw_parts(data, len)
    };

    crc32_rust(slice)
}

You'll note that I've marked this function unsafe. That's because this function can still be called from Rust, and in Rust, we have a certain standard to adhere to: no Undefined Behaviour. And the call to slice::from_raw_parts may introduce UB if certain invariants are not upheld. One of those invariants is that the pointer cannot be NULL, even if len equals 0. So we check whether len equals 0, and create an empty slice if so. It's still the caller's responsibility to ensure data points to a valid slice, though. The rest is simple: we create a slice out of the parts that we received as parameters, and call crc32_rust, passing either slice. Let's have a look at what cbindgen generates:

#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdlib.h>

void say_hello(void);

/**
 * Calculate CRC32 for passed data. If len is non-zero, data must point to a valid slice in memory of length len.
 *
 * # Safety
 * This function uses [std::slice::from_raw_parts] to create a slice
 * out of the passed raw pointer and length, and
 * this function exhibits Undefined Behavior in the same cases as
 * `from_raw_parts`
 */
uint32_t crc32(const uint8_t *data,
               uintptr_t len);

No surprises there! cbindgen even converted the doc comments. Nice touch. Let's update our main.c file with a call to crc32, and a print of its result:

#include <stdio.h>  // printf
#include "bindings/rust-in-c.h"

int main()
{
    uint8_t data[] = {0, 1, 2, 3, 4, 5, 6};
    size_t data_length = 7;

    uint32_t hash = crc32(data, data_length);
    printf("Hash: %u\n", hash);

    return 0;
}

Compile and run:

$ ./rust-in-c-static
Hash: 2908228089

Profit!

Take it to the next level

And now for the final act: the hard part. We've messed around with pointers and slices a bit, but things get more interesting when you start handling strings and owned data over an FFI boundary. So we'll get into that. In the following example, we'll implement validation of Burgerservicenummers (BSNs), which are kind of the Dutch counterpart of Social Security Numbers. I think. Anyway, it's got a defined format: a BSN is valid if and only if it consists of 8 or 9 digits, and it passes the 11 check (Elfproef (Dutch)). This is how the 11 check goes:

  1. For 8-digit BSNs, we concatenate a 0 to the end. The digits of the number are labeled as ABCDEFGHI. For example: for BSN 123456789, A = 1, B = 2, C = 3, and so forth until I
  2. Then, (9 × A) + (8 × B) + (7 × C) + (6 × D) + (5 × E) + (4 × F) + (3 × G) + (2 × H) + (-1 × I) must be a multiple of 11

In Rust, we have the newtype pattern that among other things allows us to create types that prove their validity by their existence. Say what? Hold on, I'll explain. Let's create another module in src/bsn.rs, and declare it with a "mod bsn;" in src/lib.rs. In the bsn module, define a struct:

#[repr(C)]
pub struct Bsn {
    inner: String,
}

As Bsn has a private field, it cannot be instantiated by using the struct instantiation syntax outside of the bsn module, and therefore we need a constructor to create one. Herein lies the trick: we'll have the constructor validate the bsn String passed to it, so it has to return a Result, with some custom error type. Let's first define that error, and implement some interesting traits for it. It's not very elaborate:

#[derive(Debug)]
#[repr(C)]
pub enum Error {
    InvalidBsn,
}

impl std::error::Error for Error {}

impl std::fmt::Display for Error {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            Error::InvalidBsn => write!(f, "Invalid BSN number"),
        }
    }
}

And now, we add a fallible constructor for Bsn that calls an associated function called validate:

impl Bsn {
    pub fn try_new(bsn: String) -> Result<Self, Error> {
        if Self::validate(&bsn) {
            Ok(Self { inner: bsn })
        } else {
            Err(Error::InvalidBsn)
        }
    }

    pub fn validate(bsn: &str) -> bool {
        if !matches!(bsn.len(), 8 | 9) {
            return false;
        }
        let sum = [9, 8, 7, 6, 5, 4, 3, 2, -1]
            .iter()
            .zip(bsn.chars())
            .try_fold(0, |sum, (multiplier, digit)| {
                let Some(digit) = digit.to_digit(10) else {
                    return Err(Error::InvalidBsn);
                };
                Ok(sum + (multiplier * digit as i32))
            });

        let Ok(sum) = sum else {
            return false;
        };

        sum % 11 == 0
    }
}

You can skip reading the implementation of validate if you like. It basically just checks the length and performs the 11 check. As you can see, Bsn::try_new will only produce a Bsn if the string passed to it represents a valid BSN. Therefore, if we succeeded at creating a Bsn, we know it must be valid. Very cool, now we can have other functionality build upon that assumption. But for our purposes, we want to create a Bsn from our C application, and if that were to go wrong, we want to report the reason. It'd also be nice to just be able to call the Bsn::validate function from C. Let's see what we need:

  1. A function that takes a reference to a string and calls Bsn::validate, returning its result.
  2. A function that takes a string and calls Bsn::try_new, and returns a Result<Bsn, Error>, indicating whether it succeeded, and if not, why.
  3. A function that takes a reference to an Error, and uses its Display implementation to create an error message.

Let's start out with the first one: exposing Bsn::validate. Here we go:

#[no_mangle]
extern "C" fn bsn_validate(bsn: &str) -> bool {
    Bsn::validate(bsn)
}

Run cargo check:

$ cargo check
[...]
warning: `extern` fn uses type `str`, which is not FFI-safe
  --> src/bsn.rs:52:33
   |
52 | extern "C" fn bsn_validate(bsn: &str) -> bool {
   |                                 ^^^^ not FFI-safe
   |
   = help: consider using `*const u8` and a length instead
   = note: string slices have no C equivalent
   = note: `#[warn(improper_ctypes_definitions)]` on by default

Uh oh. str is not FFI-safe. That makes sense, strs are slices of UTF-8 encoded strings, and have no C equivalent. The warning suggests using *const u8 instead. Things are already getting more complicated. But apart from that: let's first make this warning an error, since it's critical for our application that the types exposed are FFI-safe. To do that, you can add the following line to the top of your src/lib.rs file:

#![deny(improper_ctypes_definitions)]

And now to fix our newly created error! So as we know, Rusts strings differ from C strings. In Rust, &str is a UTF-8 encoded string slice, represented as a pointer to the start of that slice, and a length. In C, strings encoding is not settled upon, and they're represented as just a char[], an array of bytes, which again is represented as a pointer: char *. To find the end of it, you need to find the \0 character, denoting the end of the string. We need to do some conversion to make things work.

Let's go! First, we update the signature of bsn_validate and add a little todo!:

#[no_mangle]
extern "C" fn bsn_validate(bsn: *const std::ffi::c_char) -> bool {
    let bsn = todo!("convert bsn to &str");
    Bsn::validate(bsn)
}

If you've looked at the signature thoroughly, you'll have seen the *const c_char being different from what Rust suggested. c_char is a type alias of i8, which is a signed byte, corresponding to C's char. Using c_char instead of i8 here has cbindgen generate a signature that takes a char *, a C string, where it would otherwise generate one that takes an int8_t *, and that results in an implicit cast on the C side as well as a warning from Clang. And we don't like warnings, of course.

To do the actual conversion, we can use Cstr::from_ptr, which is unsafe as it doesn't check the validity of the pointer or whether the string ends with a \0 character. How about we offload that responsibility to the user? Let's mark this function unsafe, and refer to the notes on safety of Cstr::from_ptr. Now, we need to ensure the string is UTF-8 encoded, which we can do in a number of ways: offloading this responsibility to the user again, or we can check the encoding at runtime. Which one you should choose really depends on your use case. For illustrating purposes, I'm going for Cstr::to_str, and unwrap its result:

/// Checks whether the passed string represents a valid BSN.
/// Panicks if the passed string is not UTF-8 encoded.
///
/// # Safety
/// This function uses [std::ffi::CStr::from_ptr] to create a `CStr`
/// out of the passed raw pointer, and
/// this function exhibits Undefined Behavior in the same cases as
/// `from_ptr`.
#[no_mangle]
unsafe extern "C" fn bsn_validate(bsn: *const std::ffi::c_char) -> bool {
    let bsn = std::ffi::CStr::from_ptr(bsn).to_str().unwrap();
    Bsn::validate(bsn)
}

Now, as I mentioned, whether you want to check string validity at runtime, or have the user pinky promise that the pointer points to a valid, UTF-8 encoded sequence of bytes ending in a \0 character, really depends on your use case. I'm more of a fan of runtime checking myself, but that does have performance implications.

All right, now cargo check is happy again! Let's have a look at the header:

bool bsn_validate(const char *bsn);

Great, now let's try it out. We'll do that in a loop, so we can test a couple of strings. In main.c:

#include <stdio.h>  // printf
#include <string.h> //strlen
#include "bindings/rust-in-c.h"

int main()
{
    const char *bsn_strs[] = {"999996356", "1112223333", "bogus!",  "\xFE\xFF"};
    for (int i = 0; i < 4; i++)
    {
        const char *bsn_str = bsn_strs[i];
        if (bsn_validate(bsn_str)) {
            printf("%s is a valid BSN!\n", bsn_str);
        } else {
            printf("%s is an invalid BSN!\n", bsn_str);
        }
    }

    return 0;
}

The first one represents a valid BSN, the second and third one invalid BSNs, and the last one is not even a valid string. It does end with a \0, but it's not valid UTF-8. Validating the last one should result in a panic. Compile and run:

$ ./rust-in-c-dynamic
999996356 is a valid BSN!
1112223333 is an invalid BSN!
bogus! is an invalid BSN!
thread '<unnamed>' panicked at src/bsn.rs:62:54:
called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
fish: Job 1, './rust-in-c-dynamic' terminated by signal SIGABRT (Abort)

Whoa! We even get a panic instead of gibberish for the last one! However, we're still not there yet. Panicking in libraries is frowned upon, and unwinding over FFI boundaries is Undefined Behavior. Kinda beats the purpose of using Rust, right? You can see Rust isn't happy about this from the message on line 8. Again, you can do multiple things:

  • Compile your Rust crate with panic=abort, having the library just abort the program on panic, without unwinding;
  • Use catch_unwind to catch panics at the FFI boundary, and turn them into results or other error-indicating values;
  • Avoid panicking code, and have your function return a result.

We'll have a look into how to do the latter first, but first let's fix bsn_validate by just having it return false in case the passed string is not valid UTF-8:

/// Checks whether the passed string represents a valid BSN.
/// Returns `false` if the passed string is not UTF-8 encoded.
///
/// # Safety
/// This function uses [CStr::from_ptr] to create a [CStr]
/// out of the passed raw pointer, and
/// this function exhibits Undefined Behavior in the same cases as
/// `from_ptr`.
#[no_mangle]
unsafe extern "C" fn bsn_validate(bsn: *const std::ffi::c_char) -> bool {
    let Ok(bsn) = std::ffi::CStr::from_ptr(bsn).to_str() else {
        return false
    };
    Bsn::validate(bsn)
}

So. Loads of stuff to consider when passing strings. But our bsn_validate is done! Off to the next item on our list: exposing Bsn::try_new.

Let's create one of those extern "C" fns again, and have it call Bsn::try_new. I'll get started with a naive implementation:

#[no_mangle]
unsafe extern "C" fn bsn_try_new(bsn: *const std::ffi::c_char) -> Result<Bsn, Error> {
    let Ok(bsn) = std::ffi::CStr::from_ptr(bsn).to_str() else {
        return Err(Error::InvalidBsn);
    };
    Bsn::try_new(bsn.to_string())
}

There's a couple of considerations again. First off, as the Bsn struct wraps a String, which represents and owned, heap allocated, UTF-8 encoded Rust string, we'll have to copy the passed C string into a fresh allocation. Not very performant. Can we do without? And then the second thing, which is very much blaring in our faces:

$ cargo check
[...]
error: `extern` fn uses type `Result<Bsn, bsn::Error>`, which is not FFI-safe
  --> src/bsn.rs:69:67
   |
69 | unsafe extern "C" fn bsn_try_new(bsn: *const std::ffi::c_char) -> Result<Bsn, Error> {
   |                                                                   ^^^^^^^^^^^^^^^^^^ not FFI-safe
   |
   = help: consider adding a `#[repr(C)]`, `#[repr(transparent)]`, or integer `#[repr(...)]` attribute to this enum
   = note: enum has no representation hint

[Insert 'here we go again' meme here]. This time, the error is caused by Result not being FFI-safe itself. We'll have to fix that by creating our own surrogate Result type, which is not generic:

#[repr(C)]
enum BsnTryNewResult {
    BsnTryNewResultOk(Bsn),
    BsnTryNewResultErr(Error),
}

impl From<Result<Bsn, Error>> for BsnTryNewResult {
    fn from(res: Result<Bsn, Error>) -> Self {
        match res {
            Ok(bsn) => Self::BsnTryNewResultOk(bsn),
            Err(e) => Self::BsnTryNewResultErr(e),
        }
    }
}

#[no_mangle]
unsafe extern "C" fn bsn_try_new(bsn: *const std::ffi::c_char) -> BsnTryNewResult {
    let Ok(bsn) = std::ffi::CStr::from_ptr(bsn).to_str() else {
        return Err(Error::InvalidBsn).into();
    };
    Bsn::try_new(bsn.to_string()).into()
}

Yay! Solved the error caused by Result not being FFI-safe! Let's try again:

$ cargo check
[...]
error: `extern` fn uses type `String`, which is not FFI-safe
  --> src/bsn.rs:85:67
   |
85 | unsafe extern "C" fn bsn_try_new(bsn: *const std::ffi::c_char) -> BsnTryNewResult {
   |                                                                   ^^^^^^^^^^^^^^^ not FFI-safe
   |
   = help: consider adding a `#[repr(C)]` or `#[repr(transparent)]` attribute to this struct
   = note: this struct has unspecified layout

String? Where did that come from? Well, our Bsn type wraps a String, and as such the BsnTryNewResult, which may contain a Bsn, is not FFI-safe, even though we did annotate it with a #[repr(C)] as the error message suggests. Again, we have choices to make:

  • We could make bsn_try_new produce an opaque pointer. In that case, C code would not be able to use the String, but as we can't communicate reference lifetimes using the C ABI, we'd have to figure out how to make sure the String doesn't get dropped before its last use. And then we need to expose a means of destroying the Bsn, and thereby deallocating the string.
  • We could have Bsn wrap a const* c_char instead of a String, and ensure that that pointer points to a valid, UTF-8 encoded string. Which is valid for the lifetime of the Bsn.

Let's go the second route. Cause it seems dangerous...

Unstringing things again

What I'm trying to achieve here, is to keep ownership of the string at the C side of things. That allows the C code to control where the string lives, in code, on the stack, or on the heap, and for how long. This is a bit finnicky though, because we'll have to trick the Rust compiler into assuming what we're doing is correct. And if we can still create and use Bsns from the Rust side, that'd be very nice as well. Here's the idea: we have the Bsn wrap a &str instead, but as &str isn't FFI safe, we'll mimick it. Well, for this we need a *const u8 to indicate the start of the slice, a usize to store its length, and a means of indicating the lifetime of the reference. Here's what I came up with:

#[repr(C)]
pub struct Bsn<'inner> {
    inner: *const u8,
    len: usize,
    // &str is represented as a pointer and a length,
    // but pointers have no lifetime associated with them,
    // so we add a PhantomData to allow Rust code using the Bsn
    // to be correct.
    _marker: PhantomData<&'inner ()>,
}

impl<'inner> Bsn<'inner> {
    // This constructor ensures that the lifetime of `Bsn`
    // corresponds to the lifetime of the passed `&str`
    pub fn try_new(bsn: &'inner str) -> Result<Self, Error> {
        if Self::validate(bsn) {
            Ok(Self {
                inner: bsn.as_ptr(),
                len: bsn.len(),
                _marker: PhantomData,
            })
        } else {
            Err(Error::InvalidBsn)
        }
    }
}

As long as we ensure Bsns can only be created through Bsn::try_new, we know that Bsn represents a valid BSN, and that the pointer points to data that has the same lifetime as the one associated with the Bsn that we instantiated, and that that data represents a valid UTF-8 string. So from the Rust side of things, it's all good. We could even create a Bsn::as_str method:

impl<'inner> Bsn<'inner> {
    pub fn as_str(&self) -> &str {
        unsafe {
            // Note (unsafe): Bsn can only be created from valid, UTF-8 encoded
            // strings by calling `Bsn::try_new`
            let s: &[u8] = std::slice::from_raw_parts(self.inner, self.len);
            std::str::from_utf8_unchecked(s)
        }
    }
}

Now back to FFI. Sadly, we can't represent pointer lifetimes in the C ABI, so instead, we'll have the user of our function solemnly swear that the Bsn does not outlive the string used to set it up. And we'll keep Rust happy by having bsn_try_new produce a Bsn with a static lifetime associated with it, even if it technically doesn't. Don't tell anyone. Here's the new version of BsnTryNewResult:

#[repr(C)]
enum BsnTryNewResult<'b> {
    BsnTryNewResultOk(Bsn<'b>),
    BsnTryNewResultErr(Error),
}

impl<'b> From<Result<Bsn<'b>, Error>> for BsnTryNewResult<'b> {
    fn from(res: Result<Bsn<'b>, Error>) -> Self {
        match res {
            Ok(bsn) => Self::BsnTryNewResultOk(bsn),
            Err(e) => Self::BsnTryNewResultErr(e),
        }
    }
}

And here's the updated bsn_try_new:

/// Validate a BSN string and create a Bsn object. If the BSN is invalid,
/// or if the passed string is not valid UTF-8, returns an Error.
///
/// # Safety:
/// This function uses [CStr::from_ptr] to convert the char pointer into a CStr,
/// and as such the caller must uphold the same invariants. Furthermore you
/// _must_ ensure that the produced `Bsn` does not outlive the string data passed
/// to this function.
#[no_mangle]
unsafe extern "C" fn bsn_try_new<'b>(bsn: *const std::ffi::c_char) -> BsnTryNewResult<'b> {
    let Ok(bsn) = std::ffi::CStr::from_ptr(bsn).to_str() else {
        return Err(Error::InvalidBsn).into();
    };
    Bsn::try_new(bsn).into()
}

Yep. At this point I'm certain that somewhere on the internet, someone will be unhappy with this. As am I. This function is very, very unsafe. At the Rust side, we don't ever want to call this function. It's toxic, so please put it in a private module, and place the BsnTryNewResult there as well. But C programmers are used to thinking about lifetime shenanigans. Or at least, they should be. Right?

Let's have a look at the generated header:

typedef enum Error {
  InvalidBsn,
} Error;

typedef struct Bsn {
  const uint8_t *inner;
  uintptr_t len;
} Bsn;

typedef enum BsnTryNewResult_Tag {
  BsnTryNewResultOk,
  BsnTryNewResultErr,
} BsnTryNewResult_Tag;

typedef struct BsnTryNewResult {
  BsnTryNewResult_Tag tag;
  union {
    struct {
      struct Bsn bsn_try_new_result_ok;
    };
    struct {
      enum Error bsn_try_new_result_err;
    };
  };
} BsnTryNewResult;


// < docs omitted >
struct BsnTryNewResult bsn_try_new(const char *bsn);

I have to say I'm very pleasantly surprised that cbindgen even generates a tagged union for our enum. Even if does look pretty exotic, that is so cool! However, after all this, the option of making bsn_try_new copy the string bytes in a new allocation and produce an opaque pointer doesn't seem so bad. You would need to create some kind of destructor as well, though, which properly deallocates the string. I'll leave that as an exercise to the reader.

Let's give bsn_try_new a try! In main.c:

#include <stdio.h>  // printf
#include <string.h> //strlen
#include "bindings/rust-in-c.h"

int main()
{
    const char *bsn_strs[] = {"999996356", "1112223333", "bogus!", "\xFE\xFF\0"};
    for (int i = 0; i < 4; i++)
    {
        const char *bsn_str = bsn_strs[i];
        BsnTryNewResult bsn_result = bsn_try_new(bsn_str);
        if (bsn_result.tag == BsnTryNewResultOk) {
            printf("%s is a valid BSN!\n", bsn_str);
        } else {
            printf("%s is an invalid BSN!\n", bsn_str);
        }
    }

    return 0;
}

Compile and run:

$ ./rust-in-c-static
999996356 is a valid BSN!
1112223333 is an invalid BSN!
bogus! is an invalid BSN!
�� is an invalid BSN!

Ok, the last one turned out to be gibberish, but that's printf trying to print an invalid string without complaining. Not Rust's fault. And it's not even wrong. Apart from that, it works!

Reporting errors

The last part of this journey is reporting the error in case the BSN string is not valid. I'd like to use the Display implementation of bsn::Error to do this. And I want to enable the C side to pass a buffer into which the error message can be written. It'll take a pointer to the Error, a *mut c_char and a usize for the byte buffer to make things a bit easier on ourselves.

To invoke Display::fmt, generally we use macros like println!, format!, or write!. In our case, we can use the write! macro to write into something that implements std::io::Write. As it happens, std::io::Cursor is what we need: it wraps a &mut [u8], so we can write to in-memory buffers. In order to conjure a &mut [u8], we cast the *mut c_char to a *mut u8 and use slice::from_raw_parts_mut to create us a slice. Highly unsafe, of course. This is what the whole thing looks like:

/// Formats the error message into the passed buffer.
///
/// # Safety:
/// This function uses [std::slice::from_raw_parts_mut] to create a byte slice from
/// `buf` and `len`, and as such the caller must uphold the same invariants.
#[no_mangle]
unsafe extern "C" fn error_display(error: &Error, buf: *mut std::ffi::c_char, len: usize) {
    use std::io::Write;

    let buf = buf as *mut u8;
    let buf = std::slice::from_raw_parts_mut(buf, len);
    // A Cursor allows us to use `write!` on an in-memory buffer. Neat!
    let mut buf = std::io::Cursor::new(buf);
    // Don't forget to nul-terminate
    write!(&mut buf, "{}\0", error).unwrap();
}

And here's the generated C header:

void error_display(const enum Error *error, char *buf, uintptr_t len);

How do we use it? Well, let's update main.c to call error_display in case bsn_try_new indicated an error:

#include <stdio.h>  // printf
#include <string.h> //strlen
#include "bindings/rust-in-c.h"

int main()
{
    const char *bsn_strs[] = {"999996356", "1112223333", "bogus!", "\xFE\xFF"};
    for (int i = 0; i < 4; i++)
    {
        const char *bsn_str = bsn_strs[i];
        BsnTryNewResult bsn_result = bsn_try_new(bsn_str);
        if (bsn_result.tag == BsnTryNewResultOk)
        {
            printf("%s is a valid BSN!\n", bsn_str);
        }
        else
        {
            // Make sure the buffer is big enough
            char buf[50];
            error_display(&bsn_result.bsn_try_new_result_err, buf, 50);
            printf("%s is not a valid BSN! Error: %s\n", bsn_str, buf);
        }
    }

    return 0;
}

And as always, compile and run:

./rust-in-c-static
999996356 is a valid BSN!
1112223333 is not a valid BSN! Error: Invalid BSN number
bogus! is not a valid BSN! Error: Invalid BSN number
�� is not a valid BSN! Error: Invalid BSN number

Presto

Done! Are you still there? Hello? Yeah, that was a lot. We've seen the basics of interop between Rust and C with cbindgen, and went all the way to passing around strings, be it from C to Rust or vice-versa, be it owned or borrowed, and validating the encoding. I think cbindgen is a great help, because it gives us the C headers exactly as one would expect, including documentation, and it's very easy to configure. But there's still a lot to consider. You get all the flexibility Rust's FFI has to offer, but you need to make all the trade offs yourself as well.

For this reason, projects like Diplomat may be of help: offloading your choices to a glue code generator may mean giving up some control, but what you get back is consistency. Stay tuned for the next blog post in this series, where we'll have a look at how to make Diplomat work for you. More interested in calling C code from Rust? That, too, is still to come

All code examples from the Rust Interop Guide can be found in this repo.

(our services)

Introducing Rust in your commercial project?

Get help from the experts!

  • reduce first-project risk
  • reduce time-to-market
  • train your team on the job

> Contact us

1: B.W. Kernighan and D.M. Ritchie, The C programming language, Upper Saddle River, NJ: Prentice Hall, Inc, February 2011

Stay up-to-date

Stay up-to-date with our work and blog posts?

Related articles

The other day I came across Diplomat, an opinionated tool that makes a lot of choices for you. If you've read my previous post in this series, you'll have seen that that can be quite valuable. If you haven't read the previous article yet, do so before continuing to read this one, as it'll help you appreciate the concepts in this post, and it introduces the example as well.
In this article, we'll dive into combining Rust with Python. Specifically, this post covers calling modules written in Rust from Python scripts.
June 6, 2024

Mix in Rust

What does it actually mean to introduce Rust in an existing project, and having it communicate with other languages in the code base? This article launches a series of blog posts that provide guidance for introducing Rust into your code base step by step.