Tuesday, September 1, 2015

KVM Forum 2015 slides are available!

KVM Forum 2015 was co-located with LinuxCon North America and Linux Plumbers Conference in Seattle, Washington.

The slides and videos for talks are being posted here.

Some of my favorite talks included:

  • Towards multi-threaded TCG by Alex Bennée and Frederic Konrad. Great overview of the TCG just-in-time compiler and how it needs to be extended to support SMP guests.
  • KVM Message Passing Performance by David Matlack. Performance analysis of message-passing performance (but also affects other workloads). The latency diagrams were particularly useful in showing where the overhead is.
  • Using IPMI in QEMU by Corey Minyard. Who would have thought that IPMI would attract this audience and get so much interest? Corey gave a great overview of what IPMI is and how QEMU can support it. Hopefully this work will be upstream soon.

Wednesday, August 19, 2015

virtio-vsock: Zero-configuration host/guest communication

Slides are available for my talk at KVM Forum 2015 about virtio-vsock: Zero-configuration host/guest communication.

virtio-vsock is a new host/guest communications mechanism that allows applications to use the Sockets API to communicate between the hypervisor and virtual machines. It uses the AF_VSOCK address family which was introduced in Linux in 2013.

There are several advantages of virtio-serial. The main advantage is the familiar Sockets API semantics, which is more convenient than serial ports. See the slides for full details on what virtio-vsock offers.

Friday, August 14, 2015

Asynchronous file I/O on Linux: Plus ça change

In 2009 Anthony Liguori gave a presentation at Linux Plumbers Conference about the state of asynchronous file I/O on Linux. He talked about what was missing from POSIX AIO and Linux AIO APIs. I recently got thinking about this again after reading the source code for the io_submit(2) system call.

Over half a decade has passed and plus ça change, plus c'est la même chose. Sure, there are new file systems, device-mapper targets, the multiqueue block layer, and high IOPS PCI SSDs. There's DAX for storage devices accessible via memory load/store instructions - radically different from the block device model.

However, the io_submit(2) system call remains a treacherous ally in the quest for asynchronous file I/O. I don't think much has changed since 2009 in making Linux AIO the best asynchronous file I/O mechanism.

The main problem is that io_submit(2) waits for I/O in some cases. It can block! This defeats the purpose of asynchronous file I/O because the caller is stuck until the system call completes. If called from a program's event loop, the program becomes unresponsive until the system call returns. But even if io_submit(2) is invoked from a dedicated thread where blocking doesn't matter, latency is introduced to any further I/O requests submitted in the same io_submit(2) call.

Sources of blocking in io_submit(2) depend on the file system and block devices being used. There are many different cases but in general they occur because file I/O code paths contain synchronous I/O (for metadata I/O or page cache write-out) as well as locks/waiting (for serializing operations). This is why the io_submit(2) system call can be held up while submitting a request.

This means io_submit(2) works best on fully-allocated files, volumes, or block devices. Anything else is likely to result in blocking behavior and cause poor performance.

Since these conditions don't apply in many cases, QEMU has its own userspace thread-pool with worker threads that call preadv(2)/pwritev(2). It would be nice to default to Linux AIO but the limitations are too serious.

Have there been new developments or did I get something wrong? Let me know in the comments.

Wednesday, April 1, 2015

Tracing Linux kernel function entries/returns

Here is a neat ftrace recipe for tracing execution while the Linux kernel is inside a particular function.  This helps when a kernel function or its children are failing but you don't know where or why.

ftrace will trigger on particular functions if you give it  set_graph_function values.  That way you only see traces from the functions you are interested in.  This eliminates the noise you get when tracing all function entries/returns without a filter.

Let's trace virtio_dev_probe() and all its children:

echo virtio_dev_probe >/sys/kernel/debug/tracing/set_graph_function
echo function_graph >/sys/kernel/debug/tracing/current_tracer
echo 1 >/sys/kernel/debug/tracing/tracing_on

modprobe transport_virtio

echo 0 >/sys/kernel/debug/tracing/tracing_on
echo >/sys/kernel/debug/tracing/current_tracer
echo >/sys/kernel/debug/tracing/set_graph_function
cat /sys/kernel/debug/tracing/trace

Here is some example output:

...
 0)               |        virtqueue_kick [virtio_ring]() {
 0) + 30.207 us   |          virtqueue_kick_prepare [virtio_ring]();
 0) + 13.342 us   |          vp_notify [virtio_pci]();
 0) + 90.315 us   |        }
 0) # 61946.45 us |      }
 0)   1.046 us    |      mutex_unlock();
 0) # 102833.9 us |    }
 0)   2.411 us    |    vp_get_status [virtio_pci]();
 0)   0.826 us    |    vp_get_status [virtio_pci]();
 0) ! 130.773 us  |    vp_set_status [virtio_pci]();
 0)               |    virtio_config_enable [virtio]() {
 0)   0.689 us    |      _raw_spin_lock_irq();
 0) + 33.796 us   |    }
 0) # 105349.9 us |  }

I haven't figured out whether set_graph_function can be used on functions whose kernel module has not been loaded yet.  I think the answer is no, but please let me know in the comments if there is a way to do it.

Wednesday, March 4, 2015

QEMU participating in Outreachy

I'm delighted that QEMU is able to participate in Outreachy May-August 2015.

Outreachy (formerly known as Outreachy Program for Women) provides internships to underrepresented groups in open source.  The internship is a 12-week full-time paid software development project working on open source software.

QEMU is sharing project ideas between Outreachy and Google Summer of Code.  We encourage applicants to apply to both if they are eligible.

You can join the QEMU Outreachy IRC channel at #qemu-outreachy on irc.oftc.net.

Monday, March 2, 2015

QEMU accepted in Google Summer of Code 2015!

QEMU is participating in Google Summer of Code 2015.  I'm very excited that we are back for another great summer of students contributing to open source (with generous funding from Google).

QEMU's project ideas list is available here:
http://qemu-project.org/Google_Summer_of_Code_2015

Students, you may be interested in my advice for applying.

Good luck, students of 2015!

Tuesday, February 17, 2015

Slides posted for "KVM Architecture Overview: 2015 Edition"

I recently gave a talk on KVM's architecture.  It covers how hardware assisted virtualization works with KVM and explains key features of QEMU's architecture.

Check out the presentation to learn the basics of how KVM runs virtual machines and QEMU emulates devices.

Slides are available here (pdf).  There is no audio or video recording of this talk.

Sunday, February 1, 2015

Slides posted for "Observability in KVM: Troubleshooting virtual machines"

In my FOSDEM 2015 talk on Observability in KVM, I covered the basic tools and troubleshooting techniques for CPU, networking, and disk I/O problems in virtual machines.

My slides are now available here (PDF).

If you would like to learn the basics or get new ideas for troubleshooting with KVM, check them out.

Enjoy!