Rust interop in practice: speaking Python and Javascript

Rust interop in practice: speaking Python and Javascript
We've been writing how-tos about using Rust in existing C, Python, and C++ projects, but this article shows you an in-production example of Rust interoperability: Recently I worked on exposing the TSP Rust API to Python and NodeJS users.

This article is part of our Rust Interop Guide.

All three languages have a way of "speaking C": using the way that C represents data to ship values from one language to another. And we have solid tooling for defining the interaction between these languages.

But, there are big differences in how these languages represent data, and the story is more complicated than just transferring the bits: the real challenge is designing an API in the target language that feels nice to use.

Running example

Our running example will be the Store::open_message method:

#[derive(Debug)]
pub enum ReceivedTspMessage {
    GenericMessage {
        sender: String,
        nonconfidential_data: Option<Vec<u8>>,
        message: Vec<u8>,
        message_type: MessageType,
    },
    RequestRelationship {
        sender: String,
        // ... more fields
    },
    // ... more variants
}

impl Store {
    pub fn open_message(
        &self, 
        message: &mut [u8],
    ) -> Result<ReceivedTspMessage, Error> {
        // ...
    }
}

The exact details of the logic are not important: what we have here is a function used in production-grade code that takes an argument by-reference (i.e. lifetimes are at play) and returns a complex Rust enum.

These two concepts are interesting because they cannot be represented natively in C, and are not handled automatically by FFI tooling.

Python

For interaction between Rust and Python, PyO3 is the standard tool. It really does make the FFI story very simple, though sometimes at a minor performance cost.

To give PyO3 the information it needs, we need to wrap our internal data types in wrappers annotated with the right PyO3 pragmas, e.g.:

#[pyclass]
struct Store(tsp::Store);

#[pymethods]
impl Store { 
    fn open_message(
        &self, 
        mut message: Vec<u8>
    ) -> PyResult<FlatReceivedTspMessage> {
        self.0
            .open_message(&mut message)
            .map_err(py_exception)
            .map(FlatReceivedTspMessage::from)
    }
}

Note that the signature of the method has changed:

  • the message argument is now an owned Vec<u8> instead of a borrowed &[u8]
  • the return type is a PyResult<T> instead of a Result<T, E>
  • the return type uses FlatReceivedTspMessage instead of ReceivedTspMessage

The message argument is owned because messing with lifetimes is annonying in this API layer. Instead of trying to make that work, we accept the performance hit of the value being cloned before the function gets called.

Using PyResult as the return type makes PyO3 automatically turn Rust errors into Python exceptions, the idomatic solution to errors in Python.

Finally, the most interesting change is the use of FlatReceivedTspMessage. This type solves the problem that Python (and C) don't understand Rust enums where the variants themselves contain values. Recent Python versions do actually have a way to represent these types and work with them in an ergonomic way, but for the moment PyO3 cannot make this translation automatically.

But never fear, we can simplify the data structure on the Rust side, send it over the boundary, and then build it back up again into something that feels nice to use on the Python side.

The first step is to extract the tag or variant of the enum into its own type. Such values can be sent over the boundary, and modern Python versions have an enum concept similar to C: a Python enum is a collection of integer values with human-readable names.

#[pyclass]
#[derive(Debug, Clone, Copy)]
enum ReceivedTspMessageVariant {
    GenericMessage,
    RequestRelationship,
    // ... other variants
}

impl From<&tsp::ReceivedTspMessage> for ReceivedTspMessageVariant {
    fn from(value: &tsp::ReceivedTspMessage) -> Self {
        match value {
            tsp::ReceivedTspMessage::GenericMessage { .. } => 
                Self::GenericMessage,
            tsp::ReceivedTspMessage::RequestRelationship { .. } => 
                Self::RequestRelationship,
            // ... other variants
        }
    }
}

The trickier bit is the data. The FlatReceivedTspMessage struct contains a field for the variant, and optional fields for the possible fields of all of the variants.

#[pyclass]
#[derive(Debug)]
struct FlatReceivedTspMessage {
    #[pyo3(get, set)]
    variant: ReceivedTspMessageVariant,
    #[pyo3(get, set)]
    sender: Option<String>,
    #[pyo3(get, set)]
    nonconfidential_data: Option<Option<Vec<u8>>>,
    #[pyo3(get, set)]
    message: Option<Vec<u8>>,
    #[pyo3(get, set)]
    message_type: Option<MessageType>,
    // ... other fields
}

All variant fields are wrapped in a Rust Option, and set to None by default. For a particular variant, the fields that it uses are set to Some(_):

let mut this = FlatReceivedTspMessage::default();

match value {
    tsp::ReceivedTspMessage::GenericMessage {
        sender,
        nonconfidential_data,
        message,
        message_type,
    } => {
        this.sender = Some(sender);
        this.nonconfidential_data = Some(nonconfidential_data);
        this.message = Some(message);
        this.message_type = match message_type {
            tsp::definitions::MessageType::Signed => Some(MessageType::Signed),
            tsp::definitions::MessageType::SignedAndEncrypted => {
                Some(MessageType::SignedAndEncrypted)
            }
        };
    }
    // ... other variants
}

In Python, the Rust Option::None fields are represented as None. Now, in modern Python, we actually get to use pattern matching to turn this flattened data into something more idiomatic.

class ReceivedTspMessage:
    @staticmethod
    def from_flat(msg: FlatReceivedTspMessage):
        match msg.variant:
            case ReceivedTspMessageVariant.GenericMessage:
                return GenericMessage(msg.sender, msg.nonconfidential_data, bytes(msg.message), msg.message_type)

            case ReceivedTspMessageVariant.RequestRelationship:
                return RequestRelationship(msg.sender, msg.route, msg.nested_vid, msg.thread_id)

            # ... other variants

@dataclass
class GenericMessage(ReceivedTspMessage):
    sender: str # it's still python: types are lies
    nonconfidential_data: str
    message: str
    message_type: str

@dataclass
class RequestRelationship(ReceivedTspMessage):
    sender: str
    route: str
    nested_vid: str
    thread_id: str

We can then use this functionality to create an ergonomic Store type:

class Store:
    def __init__(self):
        self.inner = tsp_python.Store()

    def seal_message(self, *args, **kwargs):
        return self.inner.seal_message(*args, **kwargs)

    def open_message(self, *args, **kwargs):
        flat_message = self.inner.open_message(*args, **kwargs)
        return ReceivedTspMessage.from_flat(flat_message)

Unfortunately, we can't inherit from tsp_python.Store because of some PyO3 details.

Nevertheless, we can now write code that closely resembles the original Rust but also looks quite Pythonic:

class AliceBob(unittest.TestCase):
    def setUp(self):
        self.store = Store()
        self.alice = new_vid()
        self.bob = new_vid()

        self.store.add_private_vid(self.alice)
        self.store.add_private_vid(self.bob)

    def test_open_seal(self):
        message = b"hello world"

        url, sealed = self.store.seal_message(self.alice.identifier(), self.bob.identifier(), None, message)

        self.assertEqual(url, "tcp://127.0.0.1:1337")

        received = self.store.open_message(sealed)

        match received:
            case GenericMessage(sender, _, received_message, message_type):
                self.assertEqual(sender, self.alice.identifier())
                self.assertEqual(received_message, message)
                self.assertEqual(message_type, MessageType.SignedAndEncrypted)

            case other:
                self.fail(f"unexpected message type {other}")

Javascript

Next, let's look at the same idea, but for Javascript.

At first glance, Javascript and Python have many similarities: they are dynamically typed, originally object oriented, and have interesting notions about variable scope. However, it turns out that there are many subtle but crucial differences. In general I'd say that Javascript is significantly harder to work with.

Our tool of choice for creating the Javascript bindings is wasm-bindgen: our Rust interface is compiled to WebAssembly, which is then loaded by the Javascript runtime to make the functions accessible. Like with Python, we need some wrappers and pragmas to give wasm-bindgen the information it needs.

#[derive(Default, Clone)]
#[wasm_bindgen]
pub struct Store(tsp::Store);

#[wasm_bindgen]
impl Store {
    #[wasm_bindgen]
    pub fn open_message(
        &self, 
        mut message: Vec<u8>
    ) -> Result<FlatReceivedTspMessage, Error> {
        self.0
            .open_message(&mut message)
            .map(FlatReceivedTspMessage::from)
            .map_err(Error)
    }
}

The code is very similar to the Python wrapper, for similar reasons. But this time there is a bit more work to do besides defining the FlatReceivedTspMessage and its conversion functions: in order to access the FlatReceivedTspMessage fields on the Javascript side, we need to define explicit getters:

#[wasm_bindgen]
impl FlatReceivedTspMessage {
    #[wasm_bindgen(getter)]
    pub fn sender(&self) -> Option<String> {
        self.sender.clone()
    }
}

Again we need some extra wrapping on the Javascript side.

class Store {
    constructor() {
        this.inner = new wasm.Store();
    }

    open_message(...args) {
        const flatMessage = this.inner.open_message(...args);
        return ReceivedTspMessage.fromFlat(flatMessage);
    }
}

And with all of that in place, we can again write a program that closely resembles the structure of the original Rust:

let store = new Store();

let alice = new_vid();
let bob = new_vid();

let alice_identifier = alice.identifier();
let bob_identifier = bob.identifier();

store.add_private_vid(alice);
store.add_private_vid(bob);

let message = "hello world";

let { url, sealed } = store.seal_message(alice_identifier, bob_identifier, null, message);

assert.strictEqual(url, "tcp://127.0.0.1:1337");

let received = store.open_message(sealed);

if (received instanceof GenericMessage) {
    const { sender, message: messageBytes, message_type } = received;
    assert.strictEqual(sender, alice_identifier, "Sender does not match Alice's identifier");
    let receivedMessage = String.fromCharCode.apply(null, messageBytes);
    assert.strictEqual(receivedMessage, message, "Received message does not match");
    assert.strictEqual(message_type, MessageType.SignedAndEncrypted, "Message type does not match SignedAndEncrypted");
} else {
    assert.fail(`Unexpected message type: ${received}`);
}

There is a bit more friction though, such as the clunky conversion between Uint8Array and String for a comparison, and a lack of pattern matching.

An especially subtle issue is that when the Rust code uses an argument by-value, then in Javascript this value is not cloned (like with PyO3), but rather the Javascript value is invalidated. Trying to use the Javascript value after this point will throw an unhelpful runtime exception.

So strictly speaking the JS approach is more efficient, but it's also easier to mess up, causing errors that can be hard to track down.

Conclusion

Overall, we are happy with the quality of the API that we were able to create with relatively little effort. The tools, both PyO3 and wasm-bindgen, are solid but have some gotchas. With PyO3 it is unclear when clones occur, with wasm-bindgen clones won't just happen, but that has ergonomic downsides when values move out from under you.

There is a learning curve, but the tools are mature and ready for production-grade projects. For instance, with both Python and Javascript, we're able to build efficient async implementations on top of the exposed Rust primitives that hook into the native async machinery.

(our services)

Introducing Rust in your commercial project?

Get help from the experts!

  • reduce first-project risk
  • reduce time-to-market
  • train your team on the job

> Contact us

Stay up-to-date

Stay up-to-date with our work and blog posts?

Related articles

September 17, 2024

Mix in Rust with C++

This article will help you to slowly introduce some Rust into your C++ project. We'll familiarize ourselves with the tooling and go through some examples.
In this article, we'll dive into combining Rust with Python. Specifically, this post covers calling modules written in Rust from Python scripts.
Can't wait to learn how to call C code from your Rust project after reading my previous posts about Rust interop in general and calling Rust from C? Good! If you haven't read those yet, please do, because I'm going to assume you have in this article. We've seen the basics of Rust's FFI, and have experimented with calling Rust from C. Going the other way around, you'll walk into much the same challenges.