3K, 60fps, 130ms: achieving it with Rust

How we chose the Rust programming language to advance the state-of-the-art in real-time communication

Jake McGinty

June 15, 2020

This post was written collectively with Ryo Kawaguchi, Andrea Law, Brian Schwind.

Our goal for tonari is to build a virtual doorway to another space that allows for truly natural human interactions. Nearly two years in development, tonari is, to the best of our knowledge, the lowest-latency high resolution production-ready "teleconferencing" (we are truly not fond of that word) product available.

130ms glass-to-glass latency (the time from light hitting the camera to when it appears on-screen on the other side)
3K, 60fps video transmission
High-bitrate 48kHz stereo audio

Compare this to the typical 315-500ms latency for Zoom and WebRTC, as measured between two laptops (X1 Carbon and MacBook Pro) on the same network at our office. It's a huge difference. It's the difference between constantly interrupting each other versus having a natural flow of conversation. It's the difference between a blurry face from a camera seemingly pointed up someone's nose versus a wide-view high fidelity image that smoothly transfers all the subtle body language of an in-person conversation.

Since launching our first pilot in February, we've experienced no software-related downtime (tripping over ethernet cables is a different story). And as much as we would love to think we're infallible engineers, we truly don't believe we could have achieved these numbers with this level of stability without Rust.

In the beginning (or: why we're not WebRTC)

The very first tonari proof-of-concept used a basic projector, bluetooth speakers, and a website running on top of vanilla WebRTC (JavaScript). We've come a long way since those days.

While that prototype (and our opinionated vision of the future) got us grant funding, we knew that tonari would be dead on arrival unless we could achieve significantly lower latency and higher fidelity than WebRTC—two things that aren't currently associated with video chat in 2020.

We figured, “Okay, so we can just modify WebRTC directly and wrap it up with a slick UI in C++ and launch it in no time.”

A week of struggling with WebRTC’s nearly 750,000 LoC behemoth of a codebase revealed just how painful a single small change could be — how hard it was to test, and feel truly safe, with the code you were dealing with.

Let there be light...weight code

So in a furious (read: calm and thoroughly-discussed) rage quit we decided it was easier to re-implement the whole stack from scratch. We wanted to know and understand every line of code being run on our hardware, and it should be designed for the exact hardware we wanted.

Thus began our journey to the depths beyond high-level interfaces like a browser or existing RTC project, and into the world of low-level systems and hardware interaction from scratch.

We needed it to be inherently secure to protect the privacy of those who use tonari. We needed it to be performant to make it feel as human and real-time as possible. And we needed it to be maintainable as the code becomes more mature, as new brains show up and have to learn our work and expand on it.

We discussed and ruled out a handful of alternative approaches:

Security: C and C++ are memory- and concurrency-unsafe, and their disparate and seemingly infinite build systems make it hard to have a consistent and simple development experience.
Performance: Java, C#, and Go's memory management is opaque and can be difficult to work with in latency-sensitive applications where you want full control over your memory.
Maintainability: Haskell, Nim, D, and a handful of other more bespoke languages tend to be more limited in tooling, community, and hire-ability.

Rust is really the only production-ready language that we found confidently satisfies these needs.

Finding beauty in Rust

Rust's beauty lies in the countless decisions made by the development community that constantly make you feel like you can have ten cakes and eat all of them too.

Its build system is opinionated, and cleanly designed. It is itself a complete ecosystem that makes introducing new engineers to your project and setting up dev environments remarkably simple.
The memory and concurrency safety guarantees cannot be over-appreciated. We're confident that we wouldn't have done our first deployment yet if we had continued this in C++ - we'd still probably be stuck on subtle snags.
Our ability to interact at the lowest level with hardware via APIs like CUDA, oftentimes through existing crates (Rust's term for a code library), has allowed us to have higher standards about the latency we want from our first production release.

As tonari is getting more advanced, we're now choosing embedded microcontrollers whose firmware can be written in Rust so we don't have to leave our idyllic utopia into the old world of unsafe system programming.

Crates we rely on

We're not going to cat Cargo.toml here, instead focusing on some select crates that have earned the prestigious award of a lifetime invitation to each of our birthday parties forever.

"Better-than-std" crates

crossbeam is better for inter-thread communication than std::sync::mpsc in almost every way, and may be merged into std eventually.
parking_lot has a mutex implementation better than std::sync::Mutex in almost every way, and may be merged into the standard library (one day). It also provides many other useful synchronization primitives.
bytes is a more robust, and often more performant, way to play with bytes compared to Vec<u8>.
socket2 is what you will end up at if you are ever doing lower-level networking optimizations.

Beauty supply

fern is a dead-simple way to customize and prettify your logging output. We use it to keep our logs readable and internally standardized.
structopt is how you always dreamed CLI arguments would be handled. There's no reason not to use it unless you're going for bare-minimum dependencies.

Cargo cult classics

cargo-release allows us to cut internal releases painlessly.
cargo-udeps identifies unused dependencies and allows us to keep our build times minimal.
cargo tree (recently integrated in cargo) shows a dependency tree that's useful in many ways, but mainly in identifying ways to minimize dependencies.
cargo-geiger helps us quickly evaluate external dependencies for possible security (or correctness) concerns.
cargo-flamegraph helps us enormously when tracking down performance hot-spots in our code.

Project structure

The tonari codebase is a monorepo. At its root we have a Cargo workspace with a binaries crate, and a number of supporting library crates.

Having our crates in one repo makes them easy to reference in our binaries crate without needing to publish to crates.io or get too fancy with specifying git dependencies in our Cargo.toml. When the time comes to publish these libraries as open source, it's trivial to break it out into its own repo.

Library, binary, why not both?

We have one main library crate that contains a unified API for talking to hardware, media codecs, network protocols, etc. Outside of that private API, we also have standalone crates in our workspace that we consider candidates for open-sourcing. For example, we’ve written our own actor framework fit for long-running high-throughput actors, as well as our own network protocol for reliable, high-bandwidth, low-latency media streaming. We use separate binaries for different parts of the tonari system and each of these lives in binaries, a combination library/binary crate. Its library modules contains a set of reusable actors that combine our private API with our actor system, and then a collection of individual binaries that consume these actors and define the plumbing between them.

Flags as far as the eye can see

We make extensive use of feature flags to allow development of our project on different OSes (like Brian's 1970s-era MacBook Pro) or different hardware configurations. This allows us to easily swap out camera hardware without extra runtime checks or using awful sed hacks. For example, Linux uses v4l2 (Video For Linux...2) to access most webcams, but other webcams might have their own SDK. To compile for platforms that don't use v4l2 or when an SDK isn't available for a particular OS, we can put those SDKs behind feature flags and export a common interface. As a (simplified) concrete example, let's say we have a common camera interface defined as a trait:

pub trait Capture {
    /// Capture a frame from a camera, returning a Vec of RGB image bytes.
    fn capture(&mut self) -> Vec<u8>;
}

Let's also say we have three different camera interfaces - v4l2, corevideo, and polaroid. We can make our binaries work exclusively with this trait to be flexible, and we can swap in different implementations of Capture with feature flags.

#[cfg(feature = "v4l2")]
mod v4l2 {
    pub struct V4l2Capture {
        ...
    }

    impl Capture for V4l2Capture {
        fn capture(&mut self) -> Vec<u8> {
            ...
        }
    }
}

#[cfg(feature = "corevideo")]
mod corevideo {
    pub struct CoreVideoCapture {
        ...
    }

    impl Capture for CoreVideoCapture {
        fn capture(&mut self) -> Vec<u8> {
            ...
        }
    }
}

#[cfg(feature = "polaroid")]
mod polaroid {
    pub struct PolaroidCapture {
        ...
    }

    impl Capture for PolaroidCapture {
        fn capture(&mut self) -> Vec<u8> {
            ...
        }
    }
}

#[cfg(feature = "v4l2")]
pub type VideoCapture = v4l2::V4l2Capture;

#[cfg(feature = "corevideo")]
pub type VideoCapture = corevideo::CoreVideoCapture;

#[cfg(feature = "polaroid")]
pub type VideoCapture = polaroid::PolaroidCapture;

If we make our code work with things which implement the Capture trait instead of concrete types, we can now compile on and target various platforms by simply toggling feature flags. For example, we can have a struct which has a field - video_capture: Box<dyn Capture> which will let us store any type which can Capture from a camera. An example Cargo.toml file to support the capture implementations we wrote above might look something like this:

[package]
name = "tonari"
version = "1.0.0"
edition = "2018"

[features]
default = ["v4l2"]
macos = ["corevideo"]
classic = ["polaroid"]
v4l2 = ["rscam"]

[dependencies]
rscam = { version = "0.5", optional = true }     # v4l2 linux camera library
corevideo = { version = "0.1", optional = true } # MacOS camera library
polaroid = { version = "0.1", optional = true }  # Polaroid camera library (very slow FPS)

This way we can avoid building and linking to platform-specific libraries like v4l2 which aren't available everywhere.

Learning Rust on the job

A year after switching over to Rust, we onboarded our fourth engineer to the team, who didn't have much prior experience in either Rust or systems engineering. While the learning curve is undeniable (borrow checker, my old friend), we've found that Rust is incredibly empowering for those new to lower-level programming.

As mentioned, memory and concurrency safety built into the language means that an entire class of problems not only fail to compile, but the compiler itself is often the only teacher you need since its warnings are so descriptive. Much has already been written about Rust's great compiler messages, as well as excellent docs (for example, take a look at this lengthy discussion on strings), and in our case as well these have been incredibly helpful resources.

There is usually one obvious "right way" to do a thing in Rust, unlike many other languages. Code that isn't written the "right way" tends to stand out, and is easy to pick out in reviews, often times automatically by cargo clippy.

In practice, this has meant that new engineers can quickly start contributing production-ready code. Code reviews can remain focused on the implementation, as opposed to expending energy doing manual correctness checks.

The day-to-day

An IDE census

In the IDE department, Rust still shows its relative immaturity compared to some of its predecessors. This year especially, though, there have been huge strides, and each of us has found a pretty comfortable development environment at this point. • 1 of us use macOS, and 3 of us use Linux (Arch, Ubuntu, and Pop!_OS, revealing our respective levels of masochism) • 2 of us use VS Code with the rust-analyzer plugin, and 2 of us use Sublime Text with RustEnhanced. We’re often sharing setups and trying each other’s out (except Brian, who is stuck in his ways at the ripe old age of 29), and we’re constantly keeping an eye on new development tools that can help us work better together.

Code style guidelines are dead, long live `rustfmt`

You know what's wild? We don't have a code style guideline document that you have to read before submitting code. We don't need one. We just enforce rustfmt. Let me tell you: it really takes the edge off of code reviews.

How we review code

Our code reviews are straightforward since there's only four of us so far, and we are lucky to have a lot of trust amongst us. Our main goal is to have at least two pairs of eyes on every line of code, and to not block each other so we can maintain momentum.

Continuous testing

We use Google’s Cloud Builder for running our CI build, as our infrastructure stack is mostly built on GCP and it allows for easy-ish tweaking of build machine specs and custom build images. It's triggered for every commit and runs cargo clippy and cargo build. We pass -D warnings to the compiler to promote warnings into errors to ensure our changes don’t make our poor fellow coworker’s rustc rain warnings on them the next time they pull changes. To improve the CI build time, we cache the target and .cargo directories in Cloud Storage, so it can be download next time for incremental build.

for crate in $(ls */Cargo.toml | xargs dirname); do
  pushd $crate

  # Lint.
  cargo +$NIGHTLY_TOOLCHAIN clippy --no-default-features -- -D warnings

  # Build.
  time RUSTFLAGS="-D warnings" cargo build --no-default-features

  # Test.
  time cargo test --no-default-features

  popd
done

We've also heard good things about sccache and will evaluate it soon!

Integrating with existing C/C++ libraries

The Rust ecosystem is great, but there are huge existing projects out there which just aren't feasible to port to Rust yet without a huge time investment. webrtc-audio-processing is a good example. The benefits it provides (clear audio with no vocal echoes or feedback) are huge and porting it to Rust in the near-term is not likely (it's around 80k lines of C and C++ code).

Thankfully, Rust makes it quite easy to use existing C and C++ libraries. The bindgen crate does most of the heavy-lifting. Give it a header file in C or C++ and it will automatically generate (unsafe) Rust code which can call the functions defined in the header. At that point, it's up to you to create a higher level Rust crate which exposes a safe API.

A lot of this process is fairly automatic for libraries with straightforward or commonly-used build processes. Creating the higher level safe API is important though - the Rust API that bindgen provides is not very fun to use directly as it's unsafe and typically not very idiomatic. Fortunately, once you have a higher level API you can eventually swap the C library out with your own Rust version and consumers of the crate are none the wiser.

These features let us work with APIs and hardware which would either never have a native Rust API or take months or years to re-implement. Low-level OS libraries, large codebases such as webrtc-audio-processing, and manufacturer-supplied camera SDKs all become available for use in our Rust codebase without having to move our entire application language over to C++, while still performing as if we had.

C++-specific quirks

Some C++ libraries are difficult to interface with directly from Rust. You have to whitelist types because bindgen can't handle all the std::* types that get pulled in, it doesn't play well with templated functions and copy/move constructors, and a whole host of other issues documented here.

To get around these issues, we'll typically create a simplified C++ header and source wrapper which exports bindgen-friendly functions. It's a bit more work, but far less work than porting the entire library to Rust. You can see an example of this wrapper creation here.

With all of Rust's ecosystem, and C/C++ projects being only a bindgen invocation away, we have easy access to some of the highest quality software packages in existence, all without having to sacrifice execution speed.

Pain points of Rust

Rust is not problem-free. It's a relatively new language that is constantly evolving, and there are shortcomings that you should consider when evaluating a move to Rust. Here is our non-exhaustive list:

Long compile times; the popular xkcd comic, the coffee break while waiting for Rust code to compile is very real. Our codebase for example takes about 8 mins to compile non-incrementally on a moderately beefy laptop, but it can be a lot worse. The Rust compiler has a lot of work to do enforcing strong language guarantees, and it must compile your entire dependency tree from source. Incremental builds are better, but some crates come with build scripts that pull and compile non-Rust dependency code, and the build cache may need to be cleared when upgrading their versions and switching branches.
Library coverage; The library ecosystem is quite mature but the coverage is limited compared to C / C++. We ended up implementing our own jitter buffer, and we also wrap several C / C++ libraries with Rust's bindgen, which means we have unsafe regions in our Rust code. Non-trivial projects tend to have some minimal amount of unsafe coding, which adds to the learning curve and chance of memory bugs.
Rust demands you to write correct and explicit code up-front. If you get it wrong, the compiler won't let it slide. If you care less about concurrency and memory guarantees, development can feel needlessly slow. Rust developers are constantly working on improving the error messages, though. They are friendly and actionable, often with an included fix suggestion. A good foundational model on memory & concurrency also helps getting over the initial hump quicker, so we suggest taking time to truly understand the language and its guarantees.
Rust's type inferencer is so strong it makes you feel sometimes like you're using a dynamically-typed language. That said, there comes the moment where it does not quite work the way you want, especially when generics and deref coercion are involved, and you end up having to fumble around to make the inferencer happy. It can come with frustration, and it's really helpful to have someone in the team who already has gone through that stage of learning. With enough patience, that frustration often turns into a wow moment, with a deeper understanding of the language design and why it's done that way, as well as a possible bug that you would have otherwise introduced.
Language evolution; The Rust language is constantly evolving. Some of the language constructs like async are still volatile, and you may find it's best to stick with threads and the standard library when you can.

Consequences of selecting Rust so far

Experiencing no software-related downtime so far is both a pleasant surprise, and a testament to the safety provided by Rust's guarantees. Rust has also made it easy to write performant code with efficient resource usage - both our CPU and memory usage has been predictable and consistent. Without a garbage collector, we can guarantee consistent latency and frame rates.

Our experience maintaining a Rust codebase has also been great. We've been able to introduce significant improvements to our latency through sizable changes to our codebase with confidence. A clean compile doesn't always imply everything will work, but honestly, that's been the case more often than not.

The end result is a reliable product which hasn't been a nightmare to maintain (strong words, we know), and performs quickly at the high specs we demand for frame rate, latency, and resource efficiency. Again, it's hard to imagine where we might be without Rust!

Open source

We've open sourced one FFI crate so far, webrtc-audio-processing. This is one of the crates that used to live in the top level of our repo, and there are many more like it on their way to open-source.

There will be more on this subject later as we release more code, but one thing feels true: even before open-sourcing our crates, it’s felt very healthy to our code's clarity to assume each crate we crate privately will be open-sourced. This philosophy keeps our boundaries between crates more clean, and encourages us to make quicker decisions about opening up parts of our codebase with minimal fuss.

Thanks

Thanks for making it this far, we hope this brain-dump might have offered a useful thought or two for those getting into Rust, or those with advanced knowledge of Rust but using it in different environments. Please feel free to say hi at hey@tonari.no or find us on Twitter with any feedback.

Note: In an earlier version of this post, we called our audio "audiophile-quality" which is a phrase that, at this point, could mean anything. We've updated the post with more details.