Stefan Hajnoczi: August 2020

Saturday, August 29, 2020

Using kcov code coverage with meson

The meson build system has built-in code coverage support, making it easy to identify lines of code that are not exercised by tests. Meson's code coverage support works with the gcov-based tools gcovr and lcov. This post shows how to use kcov with meson instead so that code coverage can be reported when gcov is unavailable.

How do code coverage tools work?

The gcov-based tools rely on compiler instrumentation, which both gcc and llvm support. Special compiler options instruct the compiler to emit instrumentation in every compiled function in order to record which lines of code are reached.

The kcov tool takes a different approach that does not require compiler support. It uses run-time instrumentation (like breakpoints) instead of compile-time instrumentation. This makes it possible to use kcov on existing binaries without recompilation, as long as debug information is available. The tool maps program instructions to lines of source code using the debug information.

There are pros and cons regarding exact features, performance, limitations, etc. For the most part the gcov approach works well when recompilation is possible and the compiler supports gcov. In other cases kcov is needed.

How to run meson tests under kcov

Meson's built-in code coverage support is designed for gcov and therefore works as a post-processing step after meson test was run. The workflow is different with kcov since the test itself must be run under kcov so it can instrument the process.

Run meson test as follows to get per-test coverage results:

$ meson test --wrapper='kcov kcov-output'

The $BUILD_DIR/kcov-output/ directory will contain the coverage results, one set for each test that was run.

Merging coverage results

If your goal is a single coverage percentage for the entire test suite, then the per-test results need to be merged. The follow wrapper script can be used:

$ cat kcov-wrapper.sh
#!/bin/sh
test_name=$(basename $1)
exec kcov kcov-runs/$test_name "$@"

And it is invoked like this:

$ rm -rf $BUILD_DIR/kcov-runs
$ mkdir $BUILD_DIR/kcov-runs
$ meson test --wrapper "$SOURCE_DIR/kcov-wrapper.sh"
$ rm -rf $BUILD_DIR/kcov-output
$ kcov --merge $BUILD_DIR/kcov-output $BUILD_DIR/kcov-runs/*

The merged results are located in the $BUILD_DIR/kcov-output/ directory.

Conclusion

Meson already has built-in support for gcov-based code coverage. If you cannot use gcov, then kcov is an alternative that is fairly easy to integrate into a meson project.

Monday, August 24, 2020

QEMU Internals: Event loops

This post explains event loops in QEMU v5.1.0 and their unique features compared to other event loop implementations. The APIs are not covered in detail here since they are explained in doc comments. Instead, the focus is on the big picture and why things work the way they do.

Event loops are central to many I/O-bound applications like network services and graphical desktop applications. QEMU also has I/O-bound work that fits well into an event loop. Examples include the QMP monitor, disk I/O, and timers.

An event loop monitors event sources for activity and invokes a callback function when an event occurs. This makes it possible to process multiple event sources within a single CPU thread. The application can appear to do multiple things at once without multithreading because it switches between handling different event sources. This architecture is common in Javascript, Python Twisted/asyncio, and many other environments. Sometimes the event loop is hidden underneath coroutines or async/await language features (QEMU has coroutines but often the event loop is still used directly).

The most important event sources in QEMU are:

File descriptors such as sockets and character devices.
Event notifiers (implemented as eventfds on Linux).
Timers for delayed function execution.
Bottom-halves (BHs) for invoking a function in another thread or deferring a function call to avoid reentrancy.

Event loops and threads

QEMU has several different types of threads:

vCPU threads that execute guest code and perform device emulation synchronously with respect to the vCPU.
The main loop that runs the event loops (yes, there is more than one!) used by many QEMU components.
IOThreads that run event loops for device emulation concurrently with vCPUs and "out-of-band" QMP monitor commands.

It's worth explaining how device emulation interacts with threads. When guest code accesses a device register the vCPU thread traps the access and dispatches it to an emulated device. The device's read/write function runs in the vCPU thread. The vCPU thread cannot resume guest code execution until the device's read/write function returns. This means long-running operations like emulating a timer chip or disk I/O cannot be performed synchronously in the vCPU thread since they would block the vCPU. Most devices solve this problem using the main loop thread's event loops. They add timer or file descriptor monitoring callbacks to the main loop and return back to guest code execution. When the timer expires or the file descriptor becomes ready the callback function runs in the main loop thread. The final part of emulating a guest timer or disk access therefore runs in the main loop thread and not a vCPU thread.

Some devices perform the guest device register access in the main loop thread or an IOThread thanks to ioeventfd. ioeventfd is a Linux KVM API and also has a userspace fallback implementation for TCG that traps vCPU device accesses and writes to a file descriptor so another thread can handle the access.

The key point is that vCPU threads do not run an event loop. The main loop thread and IOThreads run event loops. vCPU threads can add event sources to the main loop or IOThread event loops. Callbacks run in the main loop thread or IOThreads.

How the main loop and IOThreads differ

The main loop and IOThreads share some code but are fundamentally different. The common code is called AioContext and is QEMU's native event loop API. Commonly-used functions include aio_set_fd_handler(), aio_set_event_handler(), aio_timer_init(), and aio_bh_new().

The main loop actually has a glib GMainContext and two AioContext event loops. QEMU components can use any of these event loop APIs and the main loop combines them all into a single event loop function os_host_main_loop_wait() that calls qemu_poll_ns() to wait for event sources. This makes it possible to combine glib-based code with code using the native QEMU AioContext APIs.

The reason why the main loop has two AioContexts is because one, called iohandler_ctx, is used to implement older qemu_set_fd_handler() APIs whose handlers should not run when the other AioContext, called qemu_aio_context, is run using aio_poll(). The QEMU block layer and newer code uses qemu_aio_context while older code uses iohandler_ctx. Over time it may be possible to unify the two by converting iohandler_ctx handlers to safely execute in qemu_aio_context.

IOThreads have an AioContext and a glib GMainContext. The AioContext is run using the aio_poll() API, which enables the advanced features of the event loop. If a glib event loop is needed then the GMainContext can be run using g_main_loop_run() and the AioContext event sources will be included.

Code that relies on the AioContext aio_*() APIs will work with both the main loop and IOThreads. Older code using qemu_*() APIs only works with the main loop. glib code works with both the main loop and IOThreads.

The key difference between the main loop and IOThreads is that the main loop uses a traditional event loop that calls qemu_poll_ns() while IOThreads AioContext aio_poll() has advanced features that result in better performance.

AioContext features

AioContext has the following event loop features that traditional event loops do not have:

Adaptive polling support for lower latency but slightly higher CPU consumption. AioContext event sources can have a userspace polling function that detects events without performing syscalls (e.g. peeking at a memory location). This allows the event loop to avoid block syscalls that might lead the host kernel scheduler to yield the thread and put the physical CPU into a low power state. Keeping the CPU busy and avoiding entering the kernel minimizes latency.
O(1) time complexity with respect to the number of monitored file descriptors. When there are thousands of file descriptors O(n) APIs like poll(2) spend time scanning over all file descriptors, even those that have no activity. This scalability bottleneck can be avoided with Linux io_uring and epoll APIs, both of which are supported by AioContext aio_poll(2).
Nanosecond timers. glib's event loop only has millisecond timers, which is not sufficient for emulating hardware timers.

These features are required for performance reasons. Unfortunately glib's event loop does not support them, otherwise QEMU could use GMainContext as its only event loop.

Conclusion

QEMU uses both its native AioContext event loop and glib's GMainContext. The QEMU main loop and IOThreads work differently, with IOThreads offering the best performance thanks to its AioContext aio_poll() event loop. Modern QEMU code should use AioContext APIs for optimal performance and so that the code can be used in both the main loop and IOThreads.

Thursday, August 6, 2020

Why QEMU should move from C to Rust

Welcome Redditors and HackerNews folks! This post is getting attention outside the QEMU community, so I'd like to highlight two things that may not be immediately clear: I am a QEMU maintainer and I'm not advocating to Rewrite It In Rust. Enjoy! :)

My KVM Forum 2018 presentation titled Security in QEMU: How Virtual Machines provide Isolation (pdf) (video) reviewed security bugs in QEMU and found the most common causes were C programming bugs. This includes buffer overflows, use-after-free, uninitialized memory, and more. In this post I will argue for using Rust as a safer language that prevents these classes of bugs.

In 2018 the choice of a safer language was not clear. C++ offered safe abstractions without an effective way to prohibit unsafe language features. Go also offered safety but with concerns about runtime costs. Rust looked promising but few people had deep experience with it. In 2018 I was not able to argue confidently for moving away from C in QEMU.

Now in 2020 the situation is clearer. C programming bugs are still the main cause of CVEs in QEMU. Rust has matured, its ecosystem is growing and healthy, and there are virtualization projects like Crosvm, Firecracker, and cloud-hypervisor that prove Rust is an effective language for writing Virtual Machine Monitors (VMM). In the QEMU community Paolo Bonzini and Sergio Lopez's work on rust-vmm and vhost-user code inspired me to look more closely at moving away from C.

Do we need to change programming language?

Most security bugs in QEMU are C programming bugs. This is easy to verify by looking through the CVE listings. Although I have only reviewed CVEs it seems likely that non-security bugs are also mostly C programming bugs.

Eliminating C programming bugs does not necessarily require switching programming languages. Other approaches to reducing bug rates in software include:

Coding style rules that forbid unsafe language features.
Building safe abstractions and prohibiting unsafe language features or library APIs.
Static checkers that scan source code for bugs.
Dynamic sanitizers that run software with instrumentation to identify bugs.
Unit testing and fuzzing.

The problem is, the QEMU community has been doing these things for years but new bugs are still introduced despite these efforts. It is certainly possible to spend more energy on these efforts but the evidence shows that bugs continue to slip through.

There are two issues with these approaches to reducing bugs. First, although these approaches help find existing bugs, eliminating classes of bugs so they cannot exist in the first place is a stronger approach. This is hard to do with C since the language is unsafe, placing the burden of safety on the programmer.

Second, much of the ability to write safe C code comes with experience. Custom conventions, APIs, tooling, and processes to reduce bugs is a hurdle for one-time contributors or newcomers. It makes the codebase inaccessible unless we accept lower standards for some contributors. Code quality should depend as little on experience as possible but C is notorious for being a programming language that requires a lot of practice before you can write production-quality code.

Why Rust?

Safe languages eliminate memory safety bugs (and other classes like concurrency bugs). Rust made this a priority in its design:

Use-after-free, double-free, memory leaks, and other lifetime bugs are prevented at compile-time by the borrow checker where the compiler checks ownership of data.
Buffer overflows and other memory corruptions are prevented by compile-time and runtime bounds-checking.
Pointer deference bugs are prevented by the absense of NULL pointers and strict ownership rules.
Uninitialized memory is prevented because all variables and fields must be initialized.

Rust programs can still "panic" at runtime when safety cannot be proven at compile time but this does not result in undefined behavior as seen in C programs. The program simply aborts with a backtrace. Bugs that could have resulted in arbitrary code execution in C become at most denial-of-service bugs in Rust. This reduces the severity of bugs.

As a result of this language design most C programming bugs that plague QEMU today are either caught by the compiler or turn into a safe program termination. It is reasonable to expect CVEs to reduce in number and in severity when switching to Rust.

At the same time Rust eliminates the need for many of the measures that the QEMU community added onto C because the Rust programming language and its compiler already enforce safety. This means newcomers and one-time contributors will not need QEMU-specific experience, can write production-quality code more easily, and can get their code merged more quickly. It also means reviewers will have to spend less time pointing out C programming bugs or asking for changes that comply with QEMU's way of doing things.

That said, Rust has a reputation for being a scary language due to the borrow checker. Most programmers have not thought about object lifetimes and ownership as systematically and explicitly as required by Rust. This raises the bar to learning the language, but I look at it this way: learning Rust is humanly possible, writing bug-free C code is not.

How can we change programming language?

When I checked in 2018 QEMU was 1.5 million lines of code. It has grown since then. Moving a large codebase to a new programming language is extremely difficult. If people want to convert QEMU to Rust that would be great, but I personally don't have the appetite to do it because I think the integration will be messy, result in a lot of duplication, and there is too much un(der)maintained code that is hard to convert.

The reason I am writing this post is because device emulation, the main security attack surface for VMMs, can be done in a separate program. That program can be written in any language and this is where Rust comes in. For vhost devices it is possible to write Rust device backends today and I hope this will become the default approach to writing new devices.

For non-vhost devices the vfio-user project is working on an interface out-of-process device emulation. It will be possible to implement devices in Rust there too.

If you are implementing new device emulation code please consider doing it in Rust!

Conclusion

Most security bugs in QEMU today are C programming bugs. Switching to a safer programming language will significantly reduce security bugs in QEMU. Rust is now mature and proven enough to use as the language for device emulation code. Thanks to vhost-user and vfio-user using Rust for device emulation does not require a big conversion of QEMU code, it can simply be done in a separate program. This way attack surfaces can be written in Rust to make them less susceptible to security bugs going forward.