Mix in C with Rust: A taste of C in your Rust
This article is part of our Rust Interop Guide.
C has been around for ages, and there are libraries for just about everything in C. Rust is much younger, and you'll have found that the Rust ecosystem is not as mature as C's by far. It's nice to be able to use what's already out there. On top of that, C being the lingua franca of FFI, closed-source libraries are often distributed as a binary blob exposing an interface based on the C ABI, along with a C header. To use them, your Rust application will have to speak some C when including such libraries. Being able to use libraries written in C from Rust is a great skill to have. Below, we'll have a look at how to go about that.
Bring on the C
If you want to call into C from Rust, probably the best way to go about it, is to use Bindgen.
The C type system being far more bare-bones than Rust's, tooling that generates Rust bindings from C code will need a lot of assistance from the developer. We developers need to employ whatever we know about the library to wrap and augment the generated bindings. To show how this works, we're going to wrap the TweetNaCl cryptography library. TweetNaCl is a relatively small cryptography library, written in C. If you're interested, you can find the source code here. One of the contributors already did the work of wrapping TweetNaCl in Rust, in a crate called tweetnacly
. Examples in this post are based on his work.
All right, let's take the the TweetNaCl header file and release Bindgen on it. You can use Bindgen as a CLI app called bindgen-cli
, or as a library that is invoked in your build script. We'll do the latter. Let's create a crate for the generated bindings. We'll call it tweetnacl-sys
, the sys
suffix signifying that this crate is generated from non-Rust code:
$ cargo new --lib tweetnacl-sys
$ cd tweetnacl-sys
Create three files in the root of the just-created crate: build.rs
, tweetnacl.c
and tweetnacl.h
.
Fill the latter two files with the original TweetNaCl source and header:
curl -s https://tweetnacl.cr.yp.to/20140427/tweetnacl.c -o tweetnacl.c
curl -s https://tweetnacl.cr.yp.to/20140427/tweetnacl.h -o tweetnacl.h
The file structure should look like this:
$ tree
.
├── build.rs
├── Cargo.toml
├── src
│ └── lib.rs
├── tweetnacl.c
└── tweetnacl.h
Now for the build script. Let's start by adding Bindgen as build dependency:
$ cargo add --build bindgen
Now we configure Bindgen in the build.rs
file:
fn main() {
println!("cargo:rerun-if-changed=tweetnacl.c");
println!("cargo:rerun-if-changed=tweetnacl.h");
let bindings = bindgen::builder()
.header("tweetnacl.h")
.generate()
.expect("Unable to generate bindings to tweetnacl.h");
let out_path = std::env::var("OUT_DIR").unwrap();
let out_path = std::path::Path::new(&out_path);
bindings
.write_to_file(out_path.join("tweetnacl_bindings.rs"))
.expect("Couldn't write bindings to tweetnacl.h!");
}
Pretty straightforward. The OUT_DIR
environment variable points to a folder where the build script is supposed to write any output to. If you compile your crate with cargo build
, it'll be placed in target/debug/build/tweetnacl-sys-[SOME_BIG_HASH]/out/tweetnacl_bindings.rs
. Here's a glimpse of the contents:
[...]
pub const crypto_verify_16_tweet_VERSION: &[u8; 2] = b"-\0";
pub const crypto_verify_16_BYTES: u32 = 16;
pub const crypto_verify_16_VERSION: &[u8; 2] = b"-\0";
pub const crypto_verify_16_IMPLEMENTATION: &[u8; 23] = b"crypto_verify/16/tweet\0";
pub const crypto_verify_32_tweet_BYTES: u32 = 32;
pub const crypto_verify_32_tweet_VERSION: &[u8; 2] = b"-\0";
pub const crypto_verify_32_BYTES: u32 = 32;
pub const crypto_verify_32_VERSION: &[u8; 2] = b"-\0";
pub const crypto_verify_32_IMPLEMENTATION: &[u8; 23] = b"crypto_verify/32/tweet\0";
extern "C" {
pub fn crypto_auth_hmacsha512256_tweet(
arg1: *mut ::std::os::raw::c_uchar,
arg2: *const ::std::os::raw::c_uchar,
arg3: ::std::os::raw::c_ulonglong,
arg4: *const ::std::os::raw::c_uchar,
) -> ::std::os::raw::c_int;
}
[...]
Yeah... The header consists of a load of #define
s and extern
functions , and the generated bindings are literal translations of the contents of tweetnacl.h
. Not very usable at all. But there's another problem: the Rust compiler is unable to build C source code. That's going to be a problem when we include tweetnacl-sys
into our Rust application: the linker is going to need the TweetNaCl build artifacts in order to finalize the build process. To make our crate easily buildable and includable like any other Rust crate, we'll have to instruct the system's default C compiler to compile TweetNacl to a static archive, and instruct Rust to link it statically. Invoking the C compiler can be done using the cc
crate. Let's add it to our build dependencies:
$ cargo add --build cc
In your build script, add the following lines to the end of fn main
:
cc::Build::new()
.warnings(false)
.extra_warnings(false)
.file("tweetnacl.c")
.compile("tweetnacl");
We're suppressing all warnings that result from compiling the C source here; depending on your situation you can assume the C code to be correct. In our case, the TweetNaCl is very much optimized for source code size and auditability1, and compiling it does yield a whole bunch of warnings we're conveniently not going to go into. But what's cool is that the target/debug/build/tweetnacl-sys-[SOME_BIG_HASH]/out/tweetnacl_bindings.rs
now contains an archive file called tweetnacl.a
, which the linker can use to stitch together the final binary! The cc
crate actually instructs the Rust compiler to link the just-compiled archive into the final binary. So linking's sorted too!
We need to do one more thing to finalize our tweetnacl-sys
crate: actually make the generated bindings available from Rust. Here's what you put in src/lib.rs
to do so:
#![allow(non_upper_case_globals)]
#![allow(non_camel_case_types)]
#![allow(non_snake_case)]
include!(concat!(env!("OUT_DIR"), "/tweetnacl_bindings.rs"));
As you can see, it just uses the include!
macro to include the generated bindings directly. If you have cargo-expand
installed, you can run cargo expand
to see the expansion. As the bindings are ugly and violate Rust's code style rules, we have to surpress some warnings on that, using the #![allow(...)]
attributes. Done! We have a crate that we can simply include as a dependency in our application's Cargo.toml
file, and Cargo will happily build it for us! Are we happy, though? We'll, you shouldn't be. Because directly interacting with the sys
-crate is still going to be a horrible experience. This is where you come in!.
Wraptastic!
Let's set up another crate. It's going to be called tweetnacl
, and it'll serve as an ergonomic wrapper around tweetnacl-sys
:
$ cd ..
$ cargo new --lib tweetnacl
$ cargo add --path ../tweetnacl-sys
One of the functions that Bindgen generated is crypto_hash_sha512_tweet
, and it's defined like this:
extern "C" {
pub fn crypto_hash_sha512_tweet(
arg1: *mut ::std::os::raw::c_uchar,
arg2: *const ::std::os::raw::c_uchar,
arg3: ::std::os::raw::c_ulonglong,
) -> ::std::os::raw::c_int;
}
The parameter names were made up by Bindgen, and not very descriptive. But we can have a look at the implementation, which is given by crypto_hash
in tweetnacl.c
. Took me a while to figure out how that works, but let's say I'm glad Rust's macros are less powerful than C's #define
.
int crypto_hash(u8 *out,const u8 *m,u64 n)
{
u8 h[64],x[256];
u64 i,b = n;
FOR(i,64) h[i] = iv[i];
crypto_hashblocks(h,m,n);
m += n;
n &= 127;
m -= n;
FOR(i,256) x[i] = 0;
FOR(i,n) x[i] = m[i];
x[n] = 128;
n = 256-128*(n<112);
x[n-9] = b >> 61;
ts64(x+n-8,b<<3);
crypto_hashblocks(h,x,n);
FOR(i,64) out[i] = h[i];
return 0;
}
If you're wondering about the u8
and u64
, these are typedefs declared at the top of the file. The FOR(i, n)
is a macro, also defined at the top, and it expands to for (i = 0;i < n;++i)
. Why? Probably because it's shorter. So anyway, the statement that follows FOR(i, n)
is repeated n
times. Apart from that, we note a couple of things concerning this function:
- It never returns anything other than
0
. Instead, it writes to the memory location given by theout
pointer parameter. So that's the actual return value of this function. out
is only referred to on line 583, and we see that a single byte is written to it 64 times.out
is therefore 64 bytes in size.- A hash function needs an input of variable size, and as we're now certain that
out
isn't it, that'll leavem
, andn
is going to be its length.
We could also have had a look at the documentation, but where's the fun in that? Now, in Rust, we'd do things a bit differently. First off, if the return value is going to be a series of bytes with a fixed length, we'd return an array. Secondly, we'd pass slices for variable length input. Therefore, this is what we'll put in src/lib.rs
in the tweetnacl
crate:
pub fn hash_sha512(bytes: &[u8]) -> [u8; 64] {
let mut out = [0u8; 64];
unsafe {
tweetnacl_sys::crypto_hash_sha512_tweet(
&mut out as *mut _,
bytes.as_ptr(),
bytes.len() as u64,
)
};
out
}
Look at that signature. Way better, right? The function could do with a bit of documentation, but hey, it's a big improvement already. Just one thing is bugging me still: we're initializing a big array here, and overwriting its contents before ever reading from it. Can we improve things?
Uninit
Yes, we can! Working with unitialized memory is one of the things that freak out a Rust developer a bit. Luckily for us, the standard library provides a well-documented way to construct unitialized instances of data: std::mem::MaybeUninit<T>
. Using it may still cause Undefined Behaviour, so we'll be cautious. For instance, MaybeUninit
provides footguns when using it with bools
or references, as they may not assume all possible values. Initializing a reference with zeroed memory, or a bool
with a value other than 0
or 1
is instant Undefined Behaviour! Even if you're never even using the value. The compiler uses guarantees about the values certain types can have to optimize your code. Luckily for us, an array of bytes contains no padding, and as long as the size is correct, may contain any value. So if we do the following in src/lib.rs
, we're in the clear:
use std::mem::MaybeUninit;
pub fn hash_sha512(bytes: &[u8]) -> [u8; 64] {
let mut out: MaybeUninit<[u8; 64]> = MaybeUninit::uninit();
unsafe {
tweetnacl_sys::crypto_hash_sha512_tweet(
out.as_mut_ptr().cast(),
bytes.as_ptr(),
bytes.len() as u64,
);
out.assume_init()
}
}
On line 3, we're allocating stack memory to write the hash to, and once crypto_hash_sha512_tweet
is done, we can assume that out
is actually intialized, as we're doing on line 12.
Will it blend?
The litmus test: trying it out. You should now be able to use the tweetnacl
crate in your Rust project, and try it out that way. Myself, I'm going to create a unit test, which, if it works, should prove the wrapper is correct. I tried a random online SHA512 hasher tool, and as it turns out, the hash of "Hello, world!" is 0xc1527cd893c124773d811911970c8fe6e857d6df5dc9226bd8a160614c0cd963a4ddea2b94bb7d36021ef9d865d5cea294a82dd49a0bb269f51f6e7a57f79421
. Let's verify that in our test. In tweetnacl/src/lib.rs
:
#[cfg(test)]
mod test {
use crate::hash_sha512;
#[test]
fn it_hashes() {
let bytes = b"Hello, world!";
let the_hash = hash_sha512(bytes);
assert_eq!(
the_hash,
[
0xc1, 0x52, 0x7c, 0xd8, 0x93, 0xc1, 0x24, 0x77, 0x3d, 0x81, 0x19, 0x11, 0x97, 0xc,
0x8f, 0xe6, 0xe8, 0x57, 0xd6, 0xdf, 0x5d, 0xc9, 0x22, 0x6b, 0xd8, 0xa1, 0x60, 0x61,
0x4c, 0xc, 0xd9, 0x63, 0xa4, 0xdd, 0xea, 0x2b, 0x94, 0xbb, 0x7d, 0x36, 0x2, 0x1e,
0xf9, 0xd8, 0x65, 0xd5, 0xce, 0xa2, 0x94, 0xa8, 0x2d, 0xd4, 0x9a, 0xb, 0xb2, 0x69,
0xf5, 0x1f, 0x6e, 0x7a, 0x57, 0xf7, 0x94, 0x21,
]
);
}
}
Run the test:
$ cargo test -q
running 1 test
.
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
That's a success!
Conclusion
There are loads of functions exposed by tweetnacl.h
to give the same treatment, but I think the general idea should be clear: you run Bindgen on your C header, make sure the -sys
crate is buildable and linkable like any other Rust crate, and create a wrapper crate that provides an idiomatic, ergonomic, and above all safe API to your C code. Sometimes, that means analyzing code and documentation like we did, sometimes you may need to introduce runtime checks, but in all situations, it takes a developer's brain to ensure the wrappers are correct. If you're working with a lower-level language like C, you get control, but you get responsibility, too. In my next blog in this series, we'll have a look at a higher-level language for which the tooling takes many concerns out of our hands: Python. See you there!
All code examples from the Rust Interop Guide can be found in this repo.
Introducing Rust in your commercial project?
Get help from the experts!
- reduce first-project risk
- reduce time-to-market
- train your team on the job