Mix in C with Rust: A taste of C in your Rust

Henk
Embedded software engineer
Mix in C with Rust: A taste of C in your Rust
Can't wait to learn how to call C code from your Rust project after reading my previous posts about Rust interop in general and calling Rust from C? Good! If you haven't read those yet, please do, because I'm going to assume you have in this article. We've seen the basics of Rust's FFI, and have experimented with calling Rust from C. Going the other way around, you'll walk into much the same challenges.

This article is part of our Rust Interop Guide.

C has been around for ages, and there are libraries for just about everything in C. Rust is much younger, and you'll have found that the Rust ecosystem is not as mature as C's by far. It's nice to be able to use what's already out there. On top of that, C being the lingua franca of FFI, closed-source libraries are often distributed as a binary blob exposing an interface based on the C ABI, along with a C header. To use them, your Rust application will have to speak some C when including such libraries. Being able to use libraries written in C from Rust is a great skill to have. Below, we'll have a look at how to go about that.

Bring on the C

If you want to call into C from Rust, probably the best way to go about it, is to use Bindgen.

The C type system being far more bare-bones than Rust's, tooling that generates Rust bindings from C code will need a lot of assistance from the developer. We developers need to employ whatever we know about the library to wrap and augment the generated bindings. To show how this works, we're going to wrap the TweetNaCl cryptography library. TweetNaCl is a relatively small cryptography library, written in C. If you're interested, you can find the source code here. One of the contributors already did the work of wrapping TweetNaCl in Rust, in a crate called tweetnacly. Examples in this post are based on his work.

All right, let's take the the TweetNaCl header file and release Bindgen on it. You can use Bindgen as a CLI app called bindgen-cli, or as a library that is invoked in your build script. We'll do the latter. Let's create a crate for the generated bindings. We'll call it tweetnacl-sys, the sys suffix signifying that this crate is generated from non-Rust code:

$ cargo new --lib tweetnacl-sys
$ cd tweetnacl-sys

Create three files in the root of the just-created crate: build.rs, tweetnacl.c and tweetnacl.h.

Fill the latter two files with the original TweetNaCl source and header:

curl -s https://tweetnacl.cr.yp.to/20140427/tweetnacl.c -o tweetnacl.c
curl -s https://tweetnacl.cr.yp.to/20140427/tweetnacl.h -o tweetnacl.h

The file structure should look like this:

$ tree
.
├── build.rs
├── Cargo.toml
├── src
│   └── lib.rs
├── tweetnacl.c
└── tweetnacl.h

Now for the build script. Let's start by adding Bindgen as build dependency:

$ cargo add --build bindgen

Now we configure Bindgen in the build.rs file:

fn main() {
    println!("cargo:rerun-if-changed=tweetnacl.c");
    println!("cargo:rerun-if-changed=tweetnacl.h");

    let bindings = bindgen::builder()
        .header("tweetnacl.h")
        .generate()
        .expect("Unable to generate bindings to tweetnacl.h");

    let out_path = std::env::var("OUT_DIR").unwrap();
    let out_path = std::path::Path::new(&out_path);
    bindings
        .write_to_file(out_path.join("tweetnacl_bindings.rs"))
        .expect("Couldn't write bindings to tweetnacl.h!");
}

Pretty straightforward. The OUT_DIR environment variable points to a folder where the build script is supposed to write any output to. If you compile your crate with cargo build, it'll be placed in target/debug/build/tweetnacl-sys-[SOME_BIG_HASH]/out/tweetnacl_bindings.rs. Here's a glimpse of the contents:

[...]
pub const crypto_verify_16_tweet_VERSION: &[u8; 2] = b"-\0";
pub const crypto_verify_16_BYTES: u32 = 16;
pub const crypto_verify_16_VERSION: &[u8; 2] = b"-\0";
pub const crypto_verify_16_IMPLEMENTATION: &[u8; 23] = b"crypto_verify/16/tweet\0";
pub const crypto_verify_32_tweet_BYTES: u32 = 32;
pub const crypto_verify_32_tweet_VERSION: &[u8; 2] = b"-\0";
pub const crypto_verify_32_BYTES: u32 = 32;
pub const crypto_verify_32_VERSION: &[u8; 2] = b"-\0";
pub const crypto_verify_32_IMPLEMENTATION: &[u8; 23] = b"crypto_verify/32/tweet\0";
extern "C" {
    pub fn crypto_auth_hmacsha512256_tweet(
        arg1: *mut ::std::os::raw::c_uchar,
        arg2: *const ::std::os::raw::c_uchar,
        arg3: ::std::os::raw::c_ulonglong,
        arg4: *const ::std::os::raw::c_uchar,
    ) -> ::std::os::raw::c_int;
}
[...]

Yeah... The header consists of a load of #defines and extern functions , and the generated bindings are literal translations of the contents of tweetnacl.h. Not very usable at all. But there's another problem: the Rust compiler is unable to build C source code. That's going to be a problem when we include tweetnacl-sys into our Rust application: the linker is going to need the TweetNaCl build artifacts in order to finalize the build process. To make our crate easily buildable and includable like any other Rust crate, we'll have to instruct the system's default C compiler to compile TweetNacl to a static archive, and instruct Rust to link it statically. Invoking the C compiler can be done using the cc crate. Let's add it to our build dependencies:

$ cargo add --build cc

In your build script, add the following lines to the end of fn main:

cc::Build::new()
    .warnings(false)
    .extra_warnings(false)
    .file("tweetnacl.c")
    .compile("tweetnacl");

We're suppressing all warnings that result from compiling the C source here; depending on your situation you can assume the C code to be correct. In our case, the TweetNaCl is very much optimized for source code size and auditability1, and compiling it does yield a whole bunch of warnings we're conveniently not going to go into. But what's cool is that the target/debug/build/tweetnacl-sys-[SOME_BIG_HASH]/out/tweetnacl_bindings.rs now contains an archive file called tweetnacl.a, which the linker can use to stitch together the final binary! The cc crate actually instructs the Rust compiler to link the just-compiled archive into the final binary. So linking's sorted too!

We need to do one more thing to finalize our tweetnacl-sys crate: actually make the generated bindings available from Rust. Here's what you put in src/lib.rs to do so:

#![allow(non_upper_case_globals)]
#![allow(non_camel_case_types)]
#![allow(non_snake_case)]

include!(concat!(env!("OUT_DIR"), "/tweetnacl_bindings.rs"));

As you can see, it just uses the include! macro to include the generated bindings directly. If you have cargo-expand installed, you can run cargo expand to see the expansion. As the bindings are ugly and violate Rust's code style rules, we have to surpress some warnings on that, using the #![allow(...)] attributes. Done! We have a crate that we can simply include as a dependency in our application's Cargo.toml file, and Cargo will happily build it for us! Are we happy, though? We'll, you shouldn't be. Because directly interacting with the sys-crate is still going to be a horrible experience. This is where you come in!.

Wraptastic!

Let's set up another crate. It's going to be called tweetnacl, and it'll serve as an ergonomic wrapper around tweetnacl-sys:

$ cd ..
$ cargo new --lib tweetnacl
$ cargo add --path ../tweetnacl-sys

One of the functions that Bindgen generated is crypto_hash_sha512_tweet, and it's defined like this:

extern "C" {
    pub fn crypto_hash_sha512_tweet(
        arg1: *mut ::std::os::raw::c_uchar,
        arg2: *const ::std::os::raw::c_uchar,
        arg3: ::std::os::raw::c_ulonglong,
    ) -> ::std::os::raw::c_int;
}

The parameter names were made up by Bindgen, and not very descriptive. But we can have a look at the implementation, which is given by crypto_hash in tweetnacl.c. Took me a while to figure out how that works, but let's say I'm glad Rust's macros are less powerful than C's #define.

int crypto_hash(u8 *out,const u8 *m,u64 n)
{
  u8 h[64],x[256];
  u64 i,b = n;

  FOR(i,64) h[i] = iv[i];

  crypto_hashblocks(h,m,n);
  m += n;
  n &= 127;
  m -= n;

  FOR(i,256) x[i] = 0;
  FOR(i,n) x[i] = m[i];
  x[n] = 128;

  n = 256-128*(n<112);
  x[n-9] = b >> 61;
  ts64(x+n-8,b<<3);
  crypto_hashblocks(h,x,n);

  FOR(i,64) out[i] = h[i];

  return 0;
}

If you're wondering about the u8 and u64, these are typedefs declared at the top of the file. The FOR(i, n) is a macro, also defined at the top, and it expands to for (i = 0;i < n;++i). Why? Probably because it's shorter. So anyway, the statement that follows FOR(i, n) is repeated n times. Apart from that, we note a couple of things concerning this function:

  • It never returns anything other than 0. Instead, it writes to the memory location given by the out pointer parameter. So that's the actual return value of this function.
  • out is only referred to on line 583, and we see that a single byte is written to it 64 times. out is therefore 64 bytes in size.
  • A hash function needs an input of variable size, and as we're now certain that out isn't it, that'll leave m, and n is going to be its length.

We could also have had a look at the documentation, but where's the fun in that? Now, in Rust, we'd do things a bit differently. First off, if the return value is going to be a series of bytes with a fixed length, we'd return an array. Secondly, we'd pass slices for variable length input. Therefore, this is what we'll put in src/lib.rs in the tweetnacl crate:

pub fn hash_sha512(bytes: &[u8]) -> [u8; 64] {
    let mut out = [0u8; 64];
    unsafe {
        tweetnacl_sys::crypto_hash_sha512_tweet(
            &mut out as *mut _,
            bytes.as_ptr(),
            bytes.len() as u64,
        )
    };
    out
}

Look at that signature. Way better, right? The function could do with a bit of documentation, but hey, it's a big improvement already. Just one thing is bugging me still: we're initializing a big array here, and overwriting its contents before ever reading from it. Can we improve things?

Uninit

Yes, we can! Working with unitialized memory is one of the things that freak out a Rust developer a bit. Luckily for us, the standard library provides a well-documented way to construct unitialized instances of data: std::mem::MaybeUninit<T>. Using it may still cause Undefined Behaviour, so we'll be cautious. For instance, MaybeUninit provides footguns when using it with bools or references, as they may not assume all possible values. Initializing a reference with zeroed memory, or a bool with a value other than 0 or 1 is instant Undefined Behaviour! Even if you're never even using the value. The compiler uses guarantees about the values certain types can have to optimize your code. Luckily for us, an array of bytes contains no padding, and as long as the size is correct, may contain any value. So if we do the following in src/lib.rs, we're in the clear:

use std::mem::MaybeUninit;

pub fn hash_sha512(bytes: &[u8]) -> [u8; 64] {
    let mut out: MaybeUninit<[u8; 64]> = MaybeUninit::uninit();
    unsafe {
        tweetnacl_sys::crypto_hash_sha512_tweet(
            out.as_mut_ptr().cast(),
            bytes.as_ptr(),
            bytes.len() as u64,
        );

        out.assume_init()
    }
}

On line 3, we're allocating stack memory to write the hash to, and once crypto_hash_sha512_tweet is done, we can assume that out is actually intialized, as we're doing on line 12.

Will it blend?

The litmus test: trying it out. You should now be able to use the tweetnacl crate in your Rust project, and try it out that way. Myself, I'm going to create a unit test, which, if it works, should prove the wrapper is correct. I tried a random online SHA512 hasher tool, and as it turns out, the hash of "Hello, world!" is 0xc1527cd893c124773d811911970c8fe6e857d6df5dc9226bd8a160614c0cd963a4ddea2b94bb7d36021ef9d865d5cea294a82dd49a0bb269f51f6e7a57f79421. Let's verify that in our test. In tweetnacl/src/lib.rs:

#[cfg(test)]
mod test {
    use crate::hash_sha512;

    #[test]
    fn it_hashes() {
        let bytes = b"Hello, world!";

        let the_hash = hash_sha512(bytes);
        assert_eq!(
            the_hash,
            [
                0xc1, 0x52, 0x7c, 0xd8, 0x93, 0xc1, 0x24, 0x77, 0x3d, 0x81, 0x19, 0x11, 0x97, 0xc,
                0x8f, 0xe6, 0xe8, 0x57, 0xd6, 0xdf, 0x5d, 0xc9, 0x22, 0x6b, 0xd8, 0xa1, 0x60, 0x61,
                0x4c, 0xc, 0xd9, 0x63, 0xa4, 0xdd, 0xea, 0x2b, 0x94, 0xbb, 0x7d, 0x36, 0x2, 0x1e,
                0xf9, 0xd8, 0x65, 0xd5, 0xce, 0xa2, 0x94, 0xa8, 0x2d, 0xd4, 0x9a, 0xb, 0xb2, 0x69,
                0xf5, 0x1f, 0x6e, 0x7a, 0x57, 0xf7, 0x94, 0x21,
            ]
        );
    }
}

Run the test:

$ cargo test -q

running 1 test
.
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s


running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

That's a success!

Conclusion

There are loads of functions exposed by tweetnacl.h to give the same treatment, but I think the general idea should be clear: you run Bindgen on your C header, make sure the -sys crate is buildable and linkable like any other Rust crate, and create a wrapper crate that provides an idiomatic, ergonomic, and above all safe API to your C code. Sometimes, that means analyzing code and documentation like we did, sometimes you may need to introduce runtime checks, but in all situations, it takes a developer's brain to ensure the wrappers are correct. If you're working with a lower-level language like C, you get control, but you get responsibility, too. In my next blog in this series, we'll have a look at a higher-level language for which the tooling takes many concerns out of our hands: Python. See you there!

All code examples from the Rust Interop Guide can be found in this repo.

(our services)

Introducing Rust in your commercial project?

Get help from the experts!

  • reduce first-project risk
  • reduce time-to-market
  • train your team on the job

> Contact us

1: Bernstein, D. J., van Gastel, B., Janssen, W., Lange, T., Schwabe, P., & Smetsers, S. (2015). TweetNaCl : a crypto library in 100 tweets. In D. F. Aranha, & A. Menezes (editors), Progress in Cryptology - LATINCRYPT 2014 (Third International Conference on Cryptology and Information Security in Latin America, Florianópolis, Brazil, September 17-19, 2014. Revised Selected Papers) (blz. 64-83). (Lecture Notes in Computer Science; Vol. 8895). Springer. https://doi.org/10.1007/978-3-319-16295-9_4

Stay up-to-date

Stay up-to-date with our work and blog posts?

Related articles

June 7, 2024

Mix in Rust with C

So, you've just read my previous post on Rust interoperability in general, and now you're curious about how to actually apply the concepts to your situation. You've come to the right place, because in this post and the two that follow, I'll demonstrate how to make Rust and C talk to each other.
In this article, we'll dive into combining Rust with Python. Specifically, this post covers calling modules written in Rust from Python scripts.
June 6, 2024

Mix in Rust

What does it actually mean to introduce Rust in an existing project, and having it communicate with other languages in the code base? This article launches a series of blog posts that provide guidance for introducing Rust into your code base step by step.