QEMU 1.4 includes an experimental feature for improved high IOPS disk I/O scalability called virtio-blk data plane. It extends QEMU to perform disk I/O in a dedicated thread that is optimized for scalability with high IOPS devices and many disks. IBM and Red Hat have published a whitepaper presenting the highest IOPS achieved to date under virtualization using virtio-blk data plane:
KVM Virtualized I/O Performance [PDF]
Update
Much of this post is now obsolete! The virtio-blk dataplane feature was integrated with QEMU's block layer (live migration and block layer features are now supported), virtio-scsi dataplane support was added, and libvirt XML syntax was added.
If you have a RHEL 7.2 or later host please use the following:
QEMU syntax:
$ qemu-system-x86_64 -object iothread,id=iothread0 \ -drive if=none,id=drive0,file=vm.img,format=raw,cache=none,aio=native \ -device virtio-blk-pci,iothread=iothread0,drive=drive0
Libvirt domain XML syntax:
<domain> <iothreads>1<iothreads> <cputune> <!-- optional --> <iothreadpin iothread="1" cpuset="5,6"/> </cputune> <devices> <disk type="file"> <driver iothread="1" ... /> </disk> </devices> </domain>
When can virtio-blk data plane be used?
Data plane is suitable for LVM or raw image file configurations where live migration and advanced block features are not needed. This covers many configurations where performance is the top priority.
Data plane is still an experimental feature because it only supports a subset of QEMU configurations. The QEMU 1.4 feature has the following limitations:
- Image formats are not supported (qcow2, qed, etc).
- Live migration is not supported.
- QEMU I/O throttling is not supported but cgroups blk-io controller can be used.
- Only the default "report" I/O error policy is supported (-drive werror=,rerror=).
- Hot unplug is not supported.
- Block jobs (block-stream, drive-mirror, block-commit) are not supported.
How to use virtio-blk data plane
The following libvirt domain XML enables virtio-blk data plane:
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> ... <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source file='path/to/disk.img'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> ... </devices> <qemu:commandline> <qemu:arg value='-set'/> <qemu:arg value='device.virtio-disk0.scsi=off'/> </qemu:commandline> <!-- config-wce=off is not needed in RHEL 6.4 --> <qemu:commandline> <qemu:arg value='-set'/> <qemu:arg value='device.virtio-disk0.config-wce=off'/> </qemu:commandline> <qemu:commandline> <qemu:arg value='-set'/> <qemu:arg value='device.virtio-disk0.x-data-plane=on'/> </qemu:commandline> <domain>
Note that <qemu:commandline> must be added directly inside <domain> and not inside a child tag like <devices>.
If you do not use libvirt the QEMU command-line is:
qemu -drive if=none,id=drive0,cache=none,aio=native,format=raw,file=path/to/disk.img \ -device virtio-blk,drive=drive0,scsi=off,config-wce=off,x-data-plane=on
What is the roadmap for virtio-blk data plane
The limitations of virtio-blk data plane in QEMU 1.4 will be lifted in future releases. The goal I intend to reach is that QEMU virtio-blk simply uses the data plane approach behind-the-scenes and the x-data-plane option can be dropped.
Reaching the point where data plane becomes the default requires teaching the QEMU event loop and all the core infrastructure to be thread-safe. In the past there has been a big lock which allows a lot of code to simply ignore multi-threading. This creates scalability problems that data plane avoids by using a dedicated thread. Work is underway to reduce scope of the big lock and allow the data plane thread to work with live migration and other QEMU features that are not yet supported.
Patches have also been posted upstream to convert the QEMU net subsystem and virtio-net to data plane. This demonstrates the possibility of converting other performance-critical devices.
With these developments happening, 2013 will be an exciting year for QEMU I/O performance.