Mix in Rust with C: Delegating FFI definitions to Diplomat

Henk
Embedded software engineer
Mix in Rust with C: Delegating FFI definitions to Diplomat
The other day I came across Diplomat, an opinionated tool that makes a lot of choices for you. If you've read my previous post in this series, you'll have seen that that can be quite valuable. If you haven't read the previous article yet, do so before continuing to read this one, as it'll help you appreciate the concepts in this post, and it introduces the example as well.

This article is part of our Rust Interop Guide.

Instead of just generating C headers and leaving you with all the choices, Diplomat generates the extern "C" fns and some more glue code. In this post we'll have a look at how you can make Diplomat work for you to enable calling Rust code from C, but diplomat is able to generate glue code for other host languages as well. Before we get started, let me be clear that Diplomat is experimental at the time of writing. It's still a bit rough around the edges in some places. Error messages are as insightful as you're used to, and some things are not very ergonomical. But the idea is really cool and promising. And it makes wrapping Bsn a lot easier.

Set up

So how do you use it? First, you need to add Diplomat to your dependencies in Cargo.toml:

[dependencies]
diplomat = "0.5.0"
diplomat-runtime = "0.5.0"

diplomat-runtime sounds a bit heavy, but it's basically a set of functions that make working with ownership, strings and slices a bit easier. You know, all the things we stumbled over when trying to just do it with cbindgen. Diplomat also comes with a CLI app called diplomat-tool, which does the actual generation of the C header. The Diplomat README suggests installing that app using cargo install diplomat-tool, which is fine, but I like how cbindgen allows you to trigger code generation from your build script. As such, I took the liberty of re-purposing diplomat-tool. The docs rather briefly describe a gen function, which incidentally takes parameters that correspond to the options of the diplomat-tools CLI. I threw some sensible values and some defaults at it, and got it to work! First, I added diplomat-tools as a build dependency to my Cargo.toml:

[build-dependencies]
diplomat-tool = "0.7.0"

And then, in my build.rs:

use std::path::Path;

fn main() {
    // Invoke diplomat
    diplomat_tool::gen(
      // src/lib.rs is our entry file
      std::path::Path::new("src/lib.rs"),
      // we want to generate C headers
      "c",
      // Write the generated headers to the bindings folder
      std::path::Path::new("bindings/"),
      None,
      &Default::default(),
      None,
      false,
      None
    ).unwrap();
}

That should be diplomat-tool set up.

Dig in!

Now for the fun part. Diplomat allows you to mark modules for which it should generate an FFI interface, and then bindings. Let's create a new module in a file called src/bsn_diplomat.rs, of course adding mod bsn_diplomat; to the top of your src/lib.rs file.

Let's try to get the BSN example to work again, but now using Diplomat. This time, we'll go the easier route by having our Bsn wrap a String, and therefore copying the bytes of the C string passed into the exposed constructor. The lifetime of the Bsn will therefore be separate from the lifetime of the string passed to the constructor.

First thing you do in the newly created module is define another module, and you may call it ffi if you like. At least I did. Then slap on a #[diplomat::bridge] macro attribute:

#[diplomat::bridge]
pub mod ffi {
   /* extern stuff goes here */
}

Now, everything inside this module is going to be subject to diplomat's FFI generation. Let's put some stuff in the ffi module, and see what Diplomat makes of it.

#[diplomat::bridge]
pub mod ffi {
    /// Represents a valid BSN
    pub struct Bsn {
        pub(crate) inner: String,
    }
}

I also took the liberty of succinctly documenting the Bsn type. We can expand the diplomat::bridge macro invocation, to inspect the code it generates:

mod ffi {
    #[repr(C)]
    #[doc = r" Represents a valid BSN"]
    struct Bsn {
        inner: String,
    }
    #[no_mangle]
    extern "C" fn Bsn_destroy(this: Box<Bsn>) {}
}

You can use cargo-expand to try it out yourself. Be aware that it expands everything recursively, all the way down, so sometimes you'll have to weed through the details of macros like println!. If you're running rust-analyzer, you can use it to expand a single macro so at least it doesn't yield code that is generated by macros other than the one you're interested in, but it still does that recursively.

There you go! Bsn got a #[repr(C)] attribute, and Diplomat even created a destructor for us! Now, as you know, this code won't work, as String is not FFI-safe. And if we compile, we get a rather uninformative error:

error: failed to run custom build command for `rust-in-c v0.1.0 (/home/hd/dev/tg/self/interop/rust-in-c)`

Caused by:
  process didn't exit successfully: `/home/hd/dev/tg/self/interop/rust-in-c/target/debug/build/rust-in-c-8a771a688146f378/build-script-build` (exit status: 101)
  --- stderr
  thread 'main' panicked at /home/hd/.cargo/registry/src/index.crates.io-6f17d22bba15001f/diplomat_core-0.7.0/src/ast/types.rs:242:29:
  Could not resolve symbol String in bsn_temp::ffi
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Hiding the details

This error occurs when you attempt to expose a type that is not in the list of types that Diplomat allows you to pass over FFI. The fix is to mark Bsn opaque:

#[diplomat::bridge]
mod ffi {
    #[diplomat::opaque]
    /// Represents a valid BSN
    struct Bsn {
        inner: String,
    }
}

Now our Bsn can only be passed over FFI by pointer, so our exposed constructor will need to produce a Box<Bsn> instead of just Bsn. I'd like to keep the original around, but as that returns the Bsn itself, I need to move it outside of the ffi module, and therefore out of Diplomat's reach. As Bsn::try_new may yield a BsnError, I added that to the ffi module as well.

use ffi::*;

#[diplomat::bridge]
mod ffi {

    pub enum BsnError {
        InvalidBsn,
    }

    #[diplomat::opaque]
    /// Represents a valid BSN
    pub struct Bsn {
        pub(super)inner: String,
    }

    impl Bsn {
        pub fn try_new_boxed(bsn: &str) -> Result<Box<Self>, BsnError> {
            Self::try_new(bsn).map(Box::new)
        }

        pub fn validate(bsn: &str) -> bool {
            true // For conciseness
        }
    }
}

impl Bsn {
    pub fn try_new(bsn: &str) -> Result<Self, BsnError> {
        let bsn = bsn.to_string();
        if Self::validate(&bsn) {
            Ok(Self { inner: bsn })
        } else {
            Err(BsnError::InvalidBsn)
        }
    }
}

And now Diplomat is happy again. Here's the code the diplomat::bridge attribute macro generated:

mod ffi {
    #[repr(C)]
    pub enum BsnError {
        InvalidBsn,
    }
    #[doc = r" Represents a valid BSN"]
    pub struct Bsn {
        pub(super) inner: String,
    }
    impl Bsn {
        pub fn try_new_boxed(bsn: &str) -> Result<Box<Self>, BsnError> {
            Self::try_new(bsn).map(Box::new)
        }
        pub fn validate(bsn: &str) -> bool {
            true
        }
    }
    #[no_mangle]
    extern "C" fn Bsn_try_new_boxed(
        bsn_diplomat_data: *const u8,
        bsn_diplomat_len: usize,
    ) -> diplomat_runtime::DiplomatResult<Box<Bsn>, BsnError> {
        Bsn::try_new_boxed(unsafe {
            core::str::from_utf8(core::slice::from_raw_parts(
                bsn_diplomat_data,
                bsn_diplomat_len,
            ))
            .unwrap()
        })
        .into()
    }
    #[no_mangle]
    extern "C" fn Bsn_validate(bsn_diplomat_data: *const u8, bsn_diplomat_len: usize) -> bool {
        Bsn::validate(unsafe {
            core::str::from_utf8(core::slice::from_raw_parts(
                bsn_diplomat_data,
                bsn_diplomat_len,
            ))
            .unwrap()
        })
    }
    #[no_mangle]
    extern "C" fn Bsn_destroy(this: Box<Bsn>) {}

    #[no_mangle]
    extern "C" fn BsnError_destroy(this: Box<BsnError>) {}
}

This above piece contains the extern "C" fns that will be exposed to C. We've got a couple of destructors on the bottom. Furthermore, we can see that Diplomat represents a &str as a *const u8 and a usize. It calls slice::from_raw_parts to turn them into a slice, which is an unsafe operation. With that, it calls str::from_utf8, and unwraps the result.

Sadly, no documentation is generated to communicate to the reader that this will be Undefined Behaviour in case the invariants slice::from_raw_parts aren't met, and that it will panic over an FFI boundary if the string is not valid UTF-8. Keep in mind, though, that Diplomat is experimental at the time of writing. Another thing to note is that our Result is turned into a DiplomatResult, which can be passed over FFI bounds, as it's essentially a tagged union.

Down by the C-side

Let's have a look at the generated C headers. Diplomat created a number of files:

$ tree bindings/
bindings/
├── BsnError.h
├── Bsn.h
├── diplomat_result_box_Bsn_BsnError.h
└── diplomat_runtime.h

bindings/diplomat_runtime.h contains some type definitions and functionality that Diplomat uses to pass slices and strings. We'll have a closer look later, as we're not currently using any of those types. bindings/BsnError.h contains the C definition of BsnError:

#ifndef BsnError_H
#define BsnError_H
#include <stdio.h>
#include <stdint.h>
#include <stddef.h>
#include <stdbool.h>
#include "diplomat_runtime.h"

#ifdef __cplusplus
namespace capi {
#endif

typedef enum BsnError {
  BsnError_InvalidBsn = 0,
} BsnError;
#ifdef __cplusplus
} // namespace capi
#endif
#ifdef __cplusplus
namespace capi {
extern "C" {
#endif

void BsnError_destroy(BsnError* self);

#ifdef __cplusplus
} // extern "C"
} // namespace capi
#endif
#endif

It ain't pretty. The parts between the #ifdef __cplusplus and #endif directives allow for using this binding from C++ as well as C. Each of the files generated by Diplomat contain these incantations. It's nice that this makes the code more portable, but it doesn't make things any more clear, so I'll omit them in the following excerpts. The layout of BsnError itself is just as expected: a simple enum with a single variant. We also get the destructor for BsnError: BsnError_destroy. Nothing too surprising here. So what's in the beautifully named file bindings/diplomat_result_box_Bsn_BsnError.h?

typedef struct diplomat_result_box_Bsn_BsnError {
    union {
        Bsn* ok;
        BsnError err;
    };
    bool is_ok;
} diplomat_result_box_Bsn_BsnError;

Well, there you go. Diplomat turned the Result<Box<Self>, BsnError> that is returned from Bsn::try_new_boxed into a tagged union with a huge name. That seems fair I guess. The ok variant of the union represents the Bsn behind a pointer, which makes sense as we marked it as opaque. Now for bindings/Bsn.h. I moved things around a little and omitted the clutter:

#include "diplomat_result_box_Bsn_BsnError.h"

typedef struct Bsn Bsn;

diplomat_result_box_Bsn_BsnError Bsn_try_new_boxed(const char* bsn_data, size_t bsn_len);

bool Bsn_validate(const char* bsn_data, size_t bsn_len);
void Bsn_destroy(Bsn* self);

Bsn is now simply an empty struct, hiding the String it wraps on the Rust side. I'm a bit disappointed that the doc comment I put on Bsn did not get copied over, but it's not unforgivable. As Bsn_try_new_boxed returns a diplomat_result_box_Bsn_BsnError, which contains just a reference to the Bsn, we're unable to get a Bsn by value.

That's nice, because now it's clear that it's Rust's job to clean up memory allocated for the Box<Bsn>. However, we still get ownership of that Bsn, and Rust is not going to clean it up automatically: the C side has to call Bsn_destroy at some point in order for Rust to deallocate the memory associated with the Bsn, which contains the string bytes copied when calling str::to_string in Bsn::try_new, way back.

Turnin' the crank

Let's give it a go! Here's my version of main.c:

#include <stdio.h>  // printf
#include <string.h> // strlen

#include "bindings/Bsn.h"

int main()
{
    const char *bsn_strs[] = {"999996356", "1112223333", "bogus!", "\xFE\xFF"};
    for (int i = 0; i < 4; i++)
    {
        const char *bsn_str = bsn_strs[i];
        diplomat_result_box_Bsn_BsnError bsn_result = Bsn_try_new_boxed(bsn_str, strlen(bsn_str));

        if (bsn_result.is_ok)
        {
            printf("%s is a valid BSN!\n", bsn_str);
        }
        else
        {
            printf("%s is not a valid BSN!\n", bsn_str);
        }
    }

    return 0;
}

Implement Bsn::validate as before, compile and run:

$ ./rust-in-c-dynamic
999996356 is a valid BSN!
1112223333 is not a valid BSN!
bogus! is not a valid BSN!
thread '<unnamed>' panicked at src/bsn_temp.rs:3:1:
called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
fish: Job 1, './rust-in-c-dynamic' terminated by signal SIGABRT (Abort)

That all went smoothly until line 4; then we hit a panic. This time, there's no good way to return an error value instead, as the unwrap call that resulted in this panic, is generated by Diplomat. But hey, our Rust code is much more concise and readable than in the cbindgen example, and best of all: it works!

Passing Strings

Let's address the elephant in the room: passing strings over the FFI boundary. We've been able to avoid that by marking the Bsn opaque, but if we want to print error messages, there's no way around it.

Like last time, we'll expose a function that takes a reference to a BsnError, as well as a buffer that can be written into. We'll print the buffer contents on the C side. Here's where the Diplomat runtime comes in. Among other things, it defines a type called DiplomatWritable, which is FFI-safe and can be used to write UTF-8 strings to. After having done so, we can access its buffer in C code, and print the contents as desired. Here's the new version of src/bsn.rs:

use ffi::*;

#[diplomat::bridge]
mod ffi {

    #[derive(Debug)]
    pub enum BsnError {
        InvalidBsn,
    }

    impl BsnError {
        #[allow(clippy::result_unit_err)]
        pub fn fmt_display(
            this: &Self,
            w: &mut diplomat_runtime::DiplomatWriteable,
        ) -> Result<(), ()> {
            use std::fmt::Write;
            write!(w, "{}", this).map_err(|_e| ())
        }
    }

    /// Represents a valid BSN
    #[diplomat::opaque]
    pub struct Bsn {
        pub(super) inner: String,
    }

    impl Bsn {
        pub fn try_new_boxed(bsn: &str) -> Result<Box<Self>, BsnError> {
            Self::try_new(bsn).map(Box::new)
        }

        pub fn validate(bsn: &str) -> bool {
            true // For conciseness
        }
    }
}

impl Bsn {
    pub fn try_new(bsn: &str) -> Result<Self, BsnError> {
        let bsn = bsn.to_string();
        if Self::validate(&bsn) {
            Ok(Self { inner: bsn })
        } else {
            Err(BsnError::InvalidBsn)
        }
    }
}

impl std::error::Error for BsnError {}

impl std::fmt::Display for BsnError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            BsnError::InvalidBsn => write!(f, "Invalid BSN number"),
        }
    }
}

The notable additions are the #[derive(Debug)] on BsnError, the implementations of std::error::Error and std::fmt::Display, and, more interestingly, BsnError::fmt_display. For reasons I'm unsure about, and possibly it's just a limitation that will be solved in the future, BsnError::fmt_display can't take &self as a receiver. Doing so results in the following error:

"A non-opaque type was found behind a Box or reference, these can only be handled by-move as they get converted at the FFI boundary: BsnError"

However, the diplomat::bridge seems to be handling &self just fine. Moving on, we can see that BsnError::fmt_display takes a &mut diplomat_runtime::DiplomatWriteable as well. Having imported the std::fmt::Write trait, we can use the write! macro to format our BsnError right into the DiplomatWritable. This kind of works like the Cursor we used before, but this time it can be used from C, too. In real life, you'd also do error handling better than I am doing right now, but I feel we've covered this. Let's look at the generated header in bindings/BsnError.h. This time, it contains an extra function:

diplomat_result_void_void BsnError_fmt_display(const BsnError* this, DiplomatWriteable* w);

Great! The diplomat_result_void_void type corresponds to the Result<(), ()> we're returning from BsnError::fmt_display, and it's just a struct wrapping a bool. The DiplomatWriteable is defined in bindings/diplomat_runtime.h as follows:

typedef struct DiplomatWriteable {
    void* context;
    char* buf;
    size_t len;
    size_t cap;
    void (*flush)(struct DiplomatWriteable*);
    bool (*grow)(struct DiplomatWriteable*, size_t);
} DiplomatWriteable;

DiplomatWriteable diplomat_simple_writeable(char* buf, size_t buf_size);

The implementation of diplomat_simple_writeable is given by the diplomat_runtime crate. You can use it to create a DiplomatWritabe, by passing it a char buffer and a size. Let's adapt main.c a bit to have it set up a DiplomatWritable, and use it to write the error message into:

#include <stdio.h>  // printf
#include <string.h> // strlen

#include "bindings/Bsn.h"

int main()
{
    const char *bsn_strs[] = {"999996356", "1112223333", "bogus!", "\xFE\xFF"};
    for (int i = 0; i < 4; i++)
    {
        const char *bsn_str = bsn_strs[i];
        diplomat_result_box_Bsn_BsnError bsn_result = Bsn_try_new_boxed(bsn_str, strlen(bsn_str));

        if (bsn_result.is_ok)
        {
            printf("%s is a valid BSN!\n", bsn_str);
        }
        else
        {
            char buf[50];
            DiplomatWriteable error_message_w = diplomat_simple_writeable(buf, 50);
            BsnError_fmt_display(&bsn_result.err, &error_message_w);

            printf("%s is not a valid BSN! Error: %s\n", bsn_str, error_message_w.buf);
        }
    }

    return 0;
}

Implement Bsn::validate again, compile and run:

$ ./rust-in-c-dynamic
999996356 is a valid BSN!
1112223333 is not a valid BSN! Error: Invalid BSN number
bogus! is not a valid BSN! Error: Invalid BSN number
thread '<unnamed>' panicked at src/bsn_temp.rs:3:1:
called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
fish: Job 1, './rust-in-c-static' terminated by signal SIGABRT (Abort)

How about that!

Conclusion

Even though Diplomat is still a bit rough around the edges, it greatly simplifies creating glue code. The real downside it currently has, is that it introduces some panicking code that may cause stack unwindings over FFI-bounds. By making all the hard choices for us, we give up a bit of flexibility, but we gain very readable and maintainable Rust code. Of course, Diplomat still being experimental, it's not currently viable for production-grade code, but it may well become an interesting option in the future.

All code examples from the Rust Interop Guide can be found in this repo.

(our services)

Introducing Rust in your commercial project?

Get help from the experts!

  • reduce first-project risk
  • reduce time-to-market
  • train your team on the job

> Contact us

Stay up-to-date

Stay up-to-date with our work and blog posts?

Related articles

June 7, 2024

Mix in Rust with C

So, you've just read my previous post on Rust interoperability in general, and now you're curious about how to actually apply the concepts to your situation. You've come to the right place, because in this post and the two that follow, I'll demonstrate how to make Rust and C talk to each other.
In this article, we'll dive into combining Rust with Python. Specifically, this post covers calling modules written in Rust from Python scripts.
June 6, 2024

Mix in Rust

What does it actually mean to introduce Rust in an existing project, and having it communicate with other languages in the code base? This article launches a series of blog posts that provide guidance for introducing Rust into your code base step by step.