Stefan Hajnoczi: February 2011

Friday, February 25, 2011

How to access virtual machine image files from the host

I am going to explain how to mount or access virtual machine disk images from the host using qemu-nbd.

Often you want to access an image file from the host to:

Copy files in before starting a new virtual machine.
Customize configuration like setting a hostname or networking details.
Troubleshoot a virtual machine that fails to boot.
Retrieve files after you've decided to stop using a virtual machine.

There is actually a toolkit for accessing image files called libguestfs. Take a look at that first but what follows is the poor man's version using tools that come with QEMU.

Required packages

The required programs are the qemu-nbd tool and (optionally) kpartx for detecting partitions.

On Debian-based distros the packages are called qemu-utils and kpartx.

On RHEL 6 nbd support is not available out-of-the-box but you can build from source if you wish.

Please leave package names for other distros in the comments!

Remember to back up important data

Consider making a backup copy of the image file before trying this out. Especially if you don't work with disk images often it can be easy to lose data with a wrong command.

Attaching an image file

The goal is to make an image file appear on the host as a block device so it can be mounted or accessed with tools like fdisk or fsck. The image file can be in any format that QEMU supports including raw, qcow2, qed, vdi, vmdk, vpc, and others.

1. Ensure the nbd driver is loaded

The Network Block Device driver in Linux needs to be loaded:

modprobe nbd

The qemu-nbd tool will use the nbd driver to create block devices and perform I/O.

2. Connect qemu-nbd

Before you do this, make sure the virtual machine is not running! It is generally not safe to access file systems from two machines at once and this applies for virtual machines and the host.

There should be many /dev/nbdX devices available now and you can pick an unused one as the block device through which to access the image:

sudo qemu-nbd -c /dev/nbd0 path/to/image/file

Don't be surprised that there is no output from this command. On success the qemu-nbd tool exits and leaves a daemon running in the background to perform I/O. You can now access /dev/nbd0 or whichever nbd device you picked like a regular block device using mount, fdisk, fsck, and other tools.

3. (Optionally) detect partitions

The kpartx utility automatically sets up partitions for the disk image:

sudo kpartx -a /dev/nbd0

They would be named /dev/nbd0p1, /dev/nbd0p2, and so on.

Detaching an image file

When all block devices and partitions are no longer mounted or in use you can clean up as follows.

1. (Optionally) forget partitions

sudo kpartx -d /dev/nbd0

2. Disconnect qemu-nbd

sudo qemu-nbd -d /dev/nbd0

3. Remove the nbd driver

Once there are no more attached nbd devices you may wish to unload the nbd driver:

sudo rmmod nbd

More features: read-only, throwaway snapshots, and friends

The qemu-nbd tool has more features that are worth looking at:

Ensuring read-only access using the --read-only option.
Allowing write access but not saving changes to the image file using the --snapshot option. You can think of this as throwaway snapshots.
Exporting the image file over the network using the --bind and --port options. Drop the -c option because no local nbd device is used in this case. Only do this on secure private networks because there is no access control.

Hopefully this has helped you quickstart qemu-nbd for accessing image files from the host. Feel free to leave questions in the comments below.

Wednesday, February 23, 2011

Observability using QEMU tracing

I am going to describe the tracing feature in the QEMU and KVM.

Overview of QEMU tracing

Tracing is available for the first time in QEMU 0.14.0 and qemu-kvm 0.14.0. It's an optional feature and may not be enabled in distro packages yet, but it's there if you are willing to build from source.

QEMU tracing is geared towards answering questions about running virtual machines:

What I/O accesses are being made to emulated devices?
How long are disk writes taking to complete inside QEMU?
Is QEMU leaking memory or other resources by not freeing them?
Are network packets being received but filtered at the QEMU level?

In order to find answers to these questions we place trace events into the QEMU source code at strategic points. For example, every qemu_malloc() and qemu_free() call can be traced so we know what heap memory allocations are going on.

Current status

Today QEMU tracing is useful to developers and anyone troubleshooting or investigating bugs.

The set of trace events that comes with QEMU is limited but already useful for observing the block layer and certain emulated hardware. Developers are adding trace events to new code and converting some existing debug printfs to trace events. I expect the default set of trace events to grow and become more useful in the future.

Trace events are currently not a stable API so scripts that work with one version of QEMU are not guaranteed to work with another version. There is also no documentation on the semantics of particular trace events, so it is necessary to understand the code which contains the trace event to know its meaning. In the future we can make stable trace events with explicit semantics like "packet received from host".

QEMU tracing cross-platform support

You have a choice of trace backends: SystemTap, LTTng Userspace Tracer, and a built-in "simple" tracer are supported. DTrace could be added with little effort on Solaris, Mac OSX, and FreeBSD host platforms.

The available set of trace events is the same no matter which trace backend you choose.

Where to find out more

If you want to get started, check out the documentation that comes are part of QEMU.

Also check out the excellent QEMU 0.14.0 changelog for pointers related to tracing.

I looking forward to writing more about tracing in the future and sharing trace analysis scripts. In fact, I just submitted a patch to provide a Python API for processing trace files generated by the "simple" trace backend. It makes analyzing trace files quick and fun :).

Monday, February 21, 2011

Near instant kernel development cycle with KVM

I want to share my setup for rapid kernel development using KVM.

A fast development cycle makes a huge difference to productivity. For firmware and kernel development many areas can be efficiently tested inside virtual machines.

Traditionally physical test machines were used but virtualization lets you take the lab with you. This means working offline without giving up on testing.

In that past I used QEMU when working on the gPXE network bootloader. Now I am using KVM to test Linux kernel changes in less than 30 seconds and it's a really pleasant setup.

What can't be tested under KVM?

A lot of code can be tested in a virtual machine but device drivers or hardware-specific code often require physical machines. But with PCI device assignment, or passing physical PCI devices through into the virtual machine, it is becoming possible to test device drivers in a virtual machine too.

Testing kernels without disk images

Most virtual machines are booted from a disk image or an ISO file, but KVM can directly load a Linux kernel into memory skipping the bootloader. This means you don't need an image file containing the kernel and boot files. Instead, you can run a kernel directly like this:

qemu-kvm -kernel arch/x86/boot/bzImage -initrd initramfs.gz -append "console=ttyS0" -nographic

These flags directly load a kernel and initramfs from the host filesystem without the need to generate a disk image or configure a bootloader.

The optional -initrd flag loads an initramfs for the kernel to use as the root filesystem.

The -append flags adds kernel parameters and can be used to enable the serial console.

The -nographic option restricts the virtual machine to just a serial console and therefore keeps all test kernel output in your terminal rather than in a graphical window.

Building an initramfs

I don't use a distro initramfs generation utility because I like to control which files get included and the init script. Instead I use the linux-2.6/usr/gen_init_cio utility to build an initramfs cpio archive from a specification file. A neat feature of gen_init_cpio is that you don't need to be root in order to create device files or set ownership inside the initramfs. The specification file syntax looks like this:

# a comment
file <name> <location> <mode> <uid> <gid> [<hard links>]
dir <name> <mode> <uid> <gid>
nod <name> <mode> <uid> <gid> <dev_type> <maj> <min>
slink <name> <target> <mode> <uid> <gid>
pipe <name> <mode> <uid> <gid>
sock <name> <mode> <uid> <gid>

The kernel will execute the file at /init. I include busybox in the initramfs and have the following script:

#!/bin/sh
mount -t proc none /proc
mount -t sysfs none /sys
mount -t configfs none /sys/kernel/config
mount -t debugfs none /sys/kernel/debug
mount -t tmpfs none /tmp

# Test setup commands here:
insmod /lib/modules/$(uname -r)/kernel/...

exec /bin/sh -i

Instead of building out a full /lib/modules directory tree I just include those kernel module dependencies that I need. This means I use insmod(8) instead of modprobe(8) because I skip generating depmod(8) dependency metadata.

Tying it all together

Here are the steps I take to build and test a kernel:

cd linux-2.6
[...make some changes...]
make
usr/gen_init_cpio initramfs | gzip >initramfs.gz
qemu-kvm -kernel arch/x86/boot/bzImage -initrd initramfs.gz -append "console=ttyS0" -nographic

It takes about 28 seconds to the shell prompt inside the virtual machine with ccache and a hot page cache on this laptop. This keeps development fun :)!