Welcome Redditors and HackerNews folks! This post is getting attention outside the QEMU community, so I'd like to highlight two things that may not be immediately clear: I am a QEMU maintainer and I'm not advocating to Rewrite It In Rust. Enjoy! :)
My KVM Forum 2018 presentation titled Security in QEMU: How Virtual Machines provide Isolation (pdf) (video) reviewed security bugs in QEMU and found the most common causes were C programming bugs. This includes buffer overflows, use-after-free, uninitialized memory, and more. In this post I will argue for using Rust as a safer language that prevents these classes of bugs.
In 2018 the choice of a safer language was not clear. C++ offered safe abstractions without an effective way to prohibit unsafe language features. Go also offered safety but with concerns about runtime costs. Rust looked promising but few people had deep experience with it. In 2018 I was not able to argue confidently for moving away from C in QEMU.
Now in 2020 the situation is clearer. C programming bugs are still the main cause of CVEs in QEMU. Rust has matured, its ecosystem is growing and healthy, and there are virtualization projects like Crosvm, Firecracker, and cloud-hypervisor that prove Rust is an effective language for writing Virtual Machine Monitors (VMM). In the QEMU community Paolo Bonzini and Sergio Lopez's work on rust-vmm and vhost-user code inspired me to look more closely at moving away from C.
Do we need to change programming language?
Most security bugs in QEMU are C programming bugs. This is easy to verify by looking through the CVE listings. Although I have only reviewed CVEs it seems likely that non-security bugs are also mostly C programming bugs.
Eliminating C programming bugs does not necessarily require switching programming languages. Other approaches to reducing bug rates in software include:
- Coding style rules that forbid unsafe language features.
- Building safe abstractions and prohibiting unsafe language features or library APIs.
- Static checkers that scan source code for bugs.
- Dynamic sanitizers that run software with instrumentation to identify bugs.
- Unit testing and fuzzing.
The problem is, the QEMU community has been doing these things for years but new bugs are still introduced despite these efforts. It is certainly possible to spend more energy on these efforts but the evidence shows that bugs continue to slip through.
There are two issues with these approaches to reducing bugs. First,
although these approaches help find existing bugs, eliminating classes
of bugs so they cannot exist in the first place is a stronger approach.
This is hard to do with C since the language is unsafe, placing the
burden of safety on the programmer.
Second, much of the ability to write safe C code comes with
experience. Custom conventions, APIs, tooling, and processes
to reduce bugs is a hurdle for one-time contributors or
newcomers. It makes the codebase inaccessible unless we accept lower
standards for some contributors. Code quality should depend as little
on experience as possible but C is notorious for being a programming
language that requires a lot of practice before you can write
production-quality code.
Why Rust?
Safe languages eliminate memory safety bugs (and other classes like
concurrency bugs). Rust made this a priority in its design:
- Use-after-free, double-free, memory leaks, and other lifetime bugs are prevented at compile-time by the borrow checker where the compiler checks ownership of data.
- Buffer overflows and other memory corruptions are prevented by compile-time and runtime bounds-checking.
- Pointer deference bugs are prevented by the absense of NULL pointers and strict ownership rules.
- Uninitialized memory is prevented because all variables and fields must be initialized.
Rust programs can still "panic" at runtime when safety cannot be
proven at compile time but this does not result in undefined behavior
as seen in C programs. The program simply aborts with a backtrace. Bugs
that could have resulted in arbitrary code execution in C become at
most denial-of-service bugs in Rust. This reduces the severity of
bugs.
As a result of this language design most C programming bugs that
plague QEMU today are either caught by the compiler or turn into a safe
program termination. It is reasonable to expect CVEs to reduce in
number and in severity when switching to Rust.
At the same time Rust eliminates the need for many of the measures
that the QEMU community added onto C because the Rust programming
language and its compiler already enforce safety. This means newcomers
and one-time contributors will not need QEMU-specific experience, can
write production-quality code more easily, and can get their code
merged more quickly. It also means reviewers will have to spend less
time pointing out C programming bugs or asking for changes that comply
with QEMU's way of doing things.
That said, Rust has a reputation for being a scary language due to
the borrow checker. Most programmers have not thought about object
lifetimes and ownership as systematically and explicitly as required
by Rust. This raises the bar to learning the language, but I look at it
this way: learning Rust is humanly possible, writing bug-free C code is
not.
How can we change programming language?
When I checked in 2018 QEMU was 1.5 million lines of code. It has
grown since then. Moving a large codebase to a new programming language
is extremely difficult. If people want to convert QEMU to Rust that
would be great, but I personally don't have the appetite to do it
because I think the integration will be messy, result in a lot of
duplication, and there is too much un(der)maintained code that is hard
to convert.
The reason I am writing this post is because device emulation, the
main security attack surface for VMMs, can be done in a separate
program. That program can be written in any language and this is where
Rust comes in. For vhost devices it is possible to write Rust
device backends today and I hope this will become the default approach to
writing new devices.
For non-vhost devices the vfio-user
project is working on an interface out-of-process device emulation. It
will be possible to implement devices in Rust there too.
If you are implementing new device emulation code please consider
doing it in Rust!
Conclusion
Most security bugs in QEMU today are C programming bugs. Switching
to a safer programming language will significantly reduce security bugs
in QEMU. Rust is now mature and proven enough to use as the language
for device emulation code. Thanks to vhost-user and vfio-user using
Rust for device emulation does not require a big conversion of QEMU
code, it can simply be done in a separate program. This way attack
surfaces can be written in Rust to make them less susceptible to
security bugs going forward.