Mix in Rust with C: Delegating FFI definitions to Diplomat
This article is part of our Rust Interop Guide.
Instead of just generating C headers and leaving you with all the choices, Diplomat generates the extern "C" fn
s and some more glue code. In this post we'll have a look at how you can make Diplomat work for you to enable calling Rust code from C, but diplomat is able to generate glue code for other host languages as well. Before we get started, let me be clear that Diplomat is experimental at the time of writing. It's still a bit rough around the edges in some places. Error messages are as insightful as you're used to, and some things are not very ergonomical. But the idea is really cool and promising. And it makes wrapping Bsn
a lot easier.
Set up
So how do you use it? First, you need to add Diplomat to your dependencies in Cargo.toml
:
[dependencies]
diplomat = "0.5.0"
diplomat-runtime = "0.5.0"
diplomat-runtime
sounds a bit heavy, but it's basically a set of functions that make working with ownership, strings and slices a bit easier. You know, all the things we stumbled over when trying to just do it with cbindgen. Diplomat also comes with a CLI app called diplomat-tool
, which does the actual generation of the C header. The Diplomat README suggests installing that app using cargo install diplomat-tool
, which is fine, but I like how cbindgen allows you to trigger code generation from your build script. As such, I took the liberty of re-purposing diplomat-tool
. The docs rather briefly describe a gen
function, which incidentally takes parameters that correspond to the options of the diplomat-tools
CLI. I threw some sensible values and some defaults at it, and got it to work! First, I added diplomat-tools
as a build dependency to my Cargo.toml
:
[build-dependencies]
diplomat-tool = "0.7.0"
And then, in my build.rs
:
use std::path::Path;
fn main() {
// Invoke diplomat
diplomat_tool::gen(
// src/lib.rs is our entry file
std::path::Path::new("src/lib.rs"),
// we want to generate C headers
"c",
// Write the generated headers to the bindings folder
std::path::Path::new("bindings/"),
None,
&Default::default(),
None,
false,
None
).unwrap();
}
That should be diplomat-tool
set up.
Dig in!
Now for the fun part. Diplomat allows you to mark modules for which it should generate an FFI interface, and then bindings. Let's create a new module in a file called src/bsn_diplomat.rs
, of course adding mod bsn_diplomat;
to the top of your src/lib.rs
file.
Let's try to get the BSN example to work again, but now using Diplomat. This time, we'll go the easier route by having our Bsn
wrap a String
, and therefore copying the bytes of the C string passed into the exposed constructor. The lifetime of the Bsn
will therefore be separate from the lifetime of the string passed to the constructor.
First thing you do in the newly created module is define another module, and you may call it ffi
if you like. At least I did. Then slap on a #[diplomat::bridge]
macro attribute:
#[diplomat::bridge]
pub mod ffi {
/* extern stuff goes here */
}
Now, everything inside this module is going to be subject to diplomat's FFI generation. Let's put some stuff in the ffi
module, and see what Diplomat makes of it.
#[diplomat::bridge]
pub mod ffi {
/// Represents a valid BSN
pub struct Bsn {
pub(crate) inner: String,
}
}
I also took the liberty of succinctly documenting the Bsn
type. We can expand the diplomat::bridge
macro invocation, to inspect the code it generates:
mod ffi {
#[repr(C)]
#[doc = r" Represents a valid BSN"]
struct Bsn {
inner: String,
}
#[no_mangle]
extern "C" fn Bsn_destroy(this: Box<Bsn>) {}
}
You can use cargo-expand
to try it out yourself. Be aware that it expands everything recursively, all the way down, so sometimes you'll have to weed through the details of macros like println!
. If you're running rust-analyzer, you can use it to expand a single macro so at least it doesn't yield code that is generated by macros other than the one you're interested in, but it still does that recursively.
There you go! Bsn
got a #[repr(C)]
attribute, and Diplomat even created a destructor for us! Now, as you know, this code won't work, as String
is not FFI-safe. And if we compile, we get a rather uninformative error:
error: failed to run custom build command for `rust-in-c v0.1.0 (/home/hd/dev/tg/self/interop/rust-in-c)`
Caused by:
process didn't exit successfully: `/home/hd/dev/tg/self/interop/rust-in-c/target/debug/build/rust-in-c-8a771a688146f378/build-script-build` (exit status: 101)
--- stderr
thread 'main' panicked at /home/hd/.cargo/registry/src/index.crates.io-6f17d22bba15001f/diplomat_core-0.7.0/src/ast/types.rs:242:29:
Could not resolve symbol String in bsn_temp::ffi
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Hiding the details
This error occurs when you attempt to expose a type that is not in the list of types that Diplomat allows you to pass over FFI. The fix is to mark Bsn
opaque:
#[diplomat::bridge]
mod ffi {
#[diplomat::opaque]
/// Represents a valid BSN
struct Bsn {
inner: String,
}
}
Now our Bsn
can only be passed over FFI by pointer, so our exposed constructor will need to produce a Box<Bsn>
instead of just Bsn
. I'd like to keep the original around, but as that returns the Bsn
itself, I need to move it outside of the ffi
module, and therefore out of Diplomat's reach. As Bsn::try_new
may yield a BsnError
, I added that to the ffi
module as well.
use ffi::*;
#[diplomat::bridge]
mod ffi {
pub enum BsnError {
InvalidBsn,
}
#[diplomat::opaque]
/// Represents a valid BSN
pub struct Bsn {
pub(super)inner: String,
}
impl Bsn {
pub fn try_new_boxed(bsn: &str) -> Result<Box<Self>, BsnError> {
Self::try_new(bsn).map(Box::new)
}
pub fn validate(bsn: &str) -> bool {
true // For conciseness
}
}
}
impl Bsn {
pub fn try_new(bsn: &str) -> Result<Self, BsnError> {
let bsn = bsn.to_string();
if Self::validate(&bsn) {
Ok(Self { inner: bsn })
} else {
Err(BsnError::InvalidBsn)
}
}
}
And now Diplomat is happy again. Here's the code the diplomat::bridge
attribute macro generated:
mod ffi {
#[repr(C)]
pub enum BsnError {
InvalidBsn,
}
#[doc = r" Represents a valid BSN"]
pub struct Bsn {
pub(super) inner: String,
}
impl Bsn {
pub fn try_new_boxed(bsn: &str) -> Result<Box<Self>, BsnError> {
Self::try_new(bsn).map(Box::new)
}
pub fn validate(bsn: &str) -> bool {
true
}
}
#[no_mangle]
extern "C" fn Bsn_try_new_boxed(
bsn_diplomat_data: *const u8,
bsn_diplomat_len: usize,
) -> diplomat_runtime::DiplomatResult<Box<Bsn>, BsnError> {
Bsn::try_new_boxed(unsafe {
core::str::from_utf8(core::slice::from_raw_parts(
bsn_diplomat_data,
bsn_diplomat_len,
))
.unwrap()
})
.into()
}
#[no_mangle]
extern "C" fn Bsn_validate(bsn_diplomat_data: *const u8, bsn_diplomat_len: usize) -> bool {
Bsn::validate(unsafe {
core::str::from_utf8(core::slice::from_raw_parts(
bsn_diplomat_data,
bsn_diplomat_len,
))
.unwrap()
})
}
#[no_mangle]
extern "C" fn Bsn_destroy(this: Box<Bsn>) {}
#[no_mangle]
extern "C" fn BsnError_destroy(this: Box<BsnError>) {}
}
This above piece contains the extern "C" fn
s that will be exposed to C. We've got a couple of destructors on the bottom. Furthermore, we can see that Diplomat represents a &str
as a *const u8
and a usize
. It calls slice::from_raw_parts
to turn them into a slice, which is an unsafe operation. With that, it calls str::from_utf8
, and unwraps the result.
Sadly, no documentation is generated to communicate to the reader that this will be Undefined Behaviour in case the invariants slice::from_raw_parts
aren't met, and that it will panic over an FFI boundary if the string is not valid UTF-8. Keep in mind, though, that Diplomat is experimental at the time of writing. Another thing to note is that our Result
is turned into a DiplomatResult
, which can be passed over FFI bounds, as it's essentially a tagged union.
Down by the C-side
Let's have a look at the generated C headers. Diplomat created a number of files:
$ tree bindings/
bindings/
├── BsnError.h
├── Bsn.h
├── diplomat_result_box_Bsn_BsnError.h
└── diplomat_runtime.h
bindings/diplomat_runtime.h
contains some type definitions and functionality that Diplomat uses to pass slices and strings. We'll have a closer look later, as we're not currently using any of those types. bindings/BsnError.h
contains the C definition of BsnError
:
#ifndef BsnError_H
#define BsnError_H
#include <stdio.h>
#include <stdint.h>
#include <stddef.h>
#include <stdbool.h>
#include "diplomat_runtime.h"
#ifdef __cplusplus
namespace capi {
#endif
typedef enum BsnError {
BsnError_InvalidBsn = 0,
} BsnError;
#ifdef __cplusplus
} // namespace capi
#endif
#ifdef __cplusplus
namespace capi {
extern "C" {
#endif
void BsnError_destroy(BsnError* self);
#ifdef __cplusplus
} // extern "C"
} // namespace capi
#endif
#endif
It ain't pretty. The parts between the #ifdef __cplusplus
and #endif
directives allow for using this binding from C++ as well as C. Each of the files generated by Diplomat contain these incantations. It's nice that this makes the code more portable, but it doesn't make things any more clear, so I'll omit them in the following excerpts. The layout of BsnError
itself is just as expected: a simple enum with a single variant. We also get the destructor for BsnError
: BsnError_destroy
. Nothing too surprising here. So what's in the beautifully named file bindings/diplomat_result_box_Bsn_BsnError.h
?
typedef struct diplomat_result_box_Bsn_BsnError {
union {
Bsn* ok;
BsnError err;
};
bool is_ok;
} diplomat_result_box_Bsn_BsnError;
Well, there you go. Diplomat turned the Result<Box<Self>, BsnError>
that is returned from Bsn::try_new_boxed
into a tagged union with a huge name. That seems fair I guess. The ok
variant of the union represents the Bsn
behind a pointer, which makes sense as we marked it as opaque. Now for bindings/Bsn.h
. I moved things around a little and omitted the clutter:
#include "diplomat_result_box_Bsn_BsnError.h"
typedef struct Bsn Bsn;
diplomat_result_box_Bsn_BsnError Bsn_try_new_boxed(const char* bsn_data, size_t bsn_len);
bool Bsn_validate(const char* bsn_data, size_t bsn_len);
void Bsn_destroy(Bsn* self);
Bsn
is now simply an empty struct, hiding the String
it wraps on the Rust side. I'm a bit disappointed that the doc comment I put on Bsn
did not get copied over, but it's not unforgivable. As Bsn_try_new_boxed
returns a diplomat_result_box_Bsn_BsnError
, which contains just a reference to the Bsn
, we're unable to get a Bsn
by value.
That's nice, because now it's clear that it's Rust's job to clean up memory allocated for the Box<Bsn>
. However, we still get ownership of that Bsn
, and Rust is not going to clean it up automatically: the C side has to call Bsn_destroy
at some point in order for Rust to deallocate the memory associated with the Bsn
, which contains the string bytes copied when calling str::to_string
in Bsn::try_new
, way back.
Turnin' the crank
Let's give it a go! Here's my version of main.c
:
#include <stdio.h> // printf
#include <string.h> // strlen
#include "bindings/Bsn.h"
int main()
{
const char *bsn_strs[] = {"999996356", "1112223333", "bogus!", "\xFE\xFF"};
for (int i = 0; i < 4; i++)
{
const char *bsn_str = bsn_strs[i];
diplomat_result_box_Bsn_BsnError bsn_result = Bsn_try_new_boxed(bsn_str, strlen(bsn_str));
if (bsn_result.is_ok)
{
printf("%s is a valid BSN!\n", bsn_str);
}
else
{
printf("%s is not a valid BSN!\n", bsn_str);
}
}
return 0;
}
Implement Bsn::validate
as before, compile and run:
$ ./rust-in-c-dynamic
999996356 is a valid BSN!
1112223333 is not a valid BSN!
bogus! is not a valid BSN!
thread '<unnamed>' panicked at src/bsn_temp.rs:3:1:
called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
fish: Job 1, './rust-in-c-dynamic' terminated by signal SIGABRT (Abort)
That all went smoothly until line 4; then we hit a panic. This time, there's no good way to return an error value instead, as the unwrap
call that resulted in this panic, is generated by Diplomat. But hey, our Rust code is much more concise and readable than in the cbindgen example, and best of all: it works!
Passing Strings
Let's address the elephant in the room: passing strings over the FFI boundary. We've been able to avoid that by marking the Bsn
opaque, but if we want to print error messages, there's no way around it.
Like last time, we'll expose a function that takes a reference to a BsnError
, as well as a buffer that can be written into. We'll print the buffer contents on the C side. Here's where the Diplomat runtime comes in. Among other things, it defines a type called DiplomatWritable
, which is FFI-safe and can be used to write UTF-8 strings to. After having done so, we can access its buffer in C code, and print the contents as desired. Here's the new version of src/bsn.rs
:
use ffi::*;
#[diplomat::bridge]
mod ffi {
#[derive(Debug)]
pub enum BsnError {
InvalidBsn,
}
impl BsnError {
#[allow(clippy::result_unit_err)]
pub fn fmt_display(
this: &Self,
w: &mut diplomat_runtime::DiplomatWriteable,
) -> Result<(), ()> {
use std::fmt::Write;
write!(w, "{}", this).map_err(|_e| ())
}
}
/// Represents a valid BSN
#[diplomat::opaque]
pub struct Bsn {
pub(super) inner: String,
}
impl Bsn {
pub fn try_new_boxed(bsn: &str) -> Result<Box<Self>, BsnError> {
Self::try_new(bsn).map(Box::new)
}
pub fn validate(bsn: &str) -> bool {
true // For conciseness
}
}
}
impl Bsn {
pub fn try_new(bsn: &str) -> Result<Self, BsnError> {
let bsn = bsn.to_string();
if Self::validate(&bsn) {
Ok(Self { inner: bsn })
} else {
Err(BsnError::InvalidBsn)
}
}
}
impl std::error::Error for BsnError {}
impl std::fmt::Display for BsnError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
BsnError::InvalidBsn => write!(f, "Invalid BSN number"),
}
}
}
The notable additions are the #[derive(Debug)]
on BsnError
, the implementations of std::error::Error
and std::fmt::Display
, and, more interestingly, BsnError::fmt_display
. For reasons I'm unsure about, and possibly it's just a limitation that will be solved in the future, BsnError::fmt_display
can't take &self
as a receiver. Doing so results in the following error:
"A non-opaque type was found behind a Box or reference, these can only be handled by-move as they get converted at the FFI boundary: BsnError"
However, the diplomat::bridge
seems to be handling &self
just fine. Moving on, we can see that BsnError::fmt_display
takes a &mut diplomat_runtime::DiplomatWriteable
as well. Having imported the std::fmt::Write
trait, we can use the write!
macro to format our BsnError
right into the DiplomatWritable
. This kind of works like the Cursor
we used before, but this time it can be used from C, too. In real life, you'd also do error handling better than I am doing right now, but I feel we've covered this. Let's look at the generated header in bindings/BsnError.h
. This time, it contains an extra function:
diplomat_result_void_void BsnError_fmt_display(const BsnError* this, DiplomatWriteable* w);
Great! The diplomat_result_void_void
type corresponds to the Result<(), ()>
we're returning from BsnError::fmt_display
, and it's just a struct wrapping a bool
. The DiplomatWriteable
is defined in bindings/diplomat_runtime.h
as follows:
typedef struct DiplomatWriteable {
void* context;
char* buf;
size_t len;
size_t cap;
void (*flush)(struct DiplomatWriteable*);
bool (*grow)(struct DiplomatWriteable*, size_t);
} DiplomatWriteable;
DiplomatWriteable diplomat_simple_writeable(char* buf, size_t buf_size);
The implementation of diplomat_simple_writeable
is given by the diplomat_runtime
crate. You can use it to create a DiplomatWritabe
, by passing it a char
buffer and a size. Let's adapt main.c
a bit to have it set up a DiplomatWritable
, and use it to write the error message into:
#include <stdio.h> // printf
#include <string.h> // strlen
#include "bindings/Bsn.h"
int main()
{
const char *bsn_strs[] = {"999996356", "1112223333", "bogus!", "\xFE\xFF"};
for (int i = 0; i < 4; i++)
{
const char *bsn_str = bsn_strs[i];
diplomat_result_box_Bsn_BsnError bsn_result = Bsn_try_new_boxed(bsn_str, strlen(bsn_str));
if (bsn_result.is_ok)
{
printf("%s is a valid BSN!\n", bsn_str);
}
else
{
char buf[50];
DiplomatWriteable error_message_w = diplomat_simple_writeable(buf, 50);
BsnError_fmt_display(&bsn_result.err, &error_message_w);
printf("%s is not a valid BSN! Error: %s\n", bsn_str, error_message_w.buf);
}
}
return 0;
}
Implement Bsn::validate
again, compile and run:
$ ./rust-in-c-dynamic
999996356 is a valid BSN!
1112223333 is not a valid BSN! Error: Invalid BSN number
bogus! is not a valid BSN! Error: Invalid BSN number
thread '<unnamed>' panicked at src/bsn_temp.rs:3:1:
called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
fish: Job 1, './rust-in-c-static' terminated by signal SIGABRT (Abort)
How about that!
Conclusion
Even though Diplomat is still a bit rough around the edges, it greatly simplifies creating glue code. The real downside it currently has, is that it introduces some panicking code that may cause stack unwindings over FFI-bounds. By making all the hard choices for us, we give up a bit of flexibility, but we gain very readable and maintainable Rust code. Of course, Diplomat still being experimental, it's not currently viable for production-grade code, but it may well become an interesting option in the future.
All code examples from the Rust Interop Guide can be found in this repo.
Introducing Rust in your commercial project?
Get help from the experts!
- reduce first-project risk
- reduce time-to-market
- train your team on the job