Saturday, September 17, 2016

Making I/O deterministic with blkdebug breakpoints

Recently I was looking for a deterministic way to reproduce a QEMU bug that occurred when a guest reboots while an IDE/ATA TRIM (discard) request is in flight. You can find the bug report here.

A related problem is how to trigger the code path where request A is in flight when request B is issued. Being able to do this is useful for writing test suites that check I/O requests interact correctly with each other.

Both of these scenarios require the ability to put an I/O request into a specific state and keep it there. This makes the request deterministic so there is no chance of it completing too early.

QEMU has a disk I/O error injection framework called blkdebug. Block drivers like qcow2 tell blkdebug about request states, making it possible to fail or suspend requests at certain points. For example, it's possible to suspend an I/O request when qcow2 decides to free a cluster.

The blkdebug documentation mentions the break command that can suspend a request when it reaches a specific state.

Here is how we can suspend a request when qcow2 decides to free a cluster:

$ qemu-system-x86_64 -drive if=ide,id=ide-drive,file=blkdebug::test.qcow2,format=qcow2
(qemu) qemu-io ide-drive "break cluster_free A"

QEMU prints a message when the request is suspended. It can be resumed with:

(qemu) qemu-io ide-drive "resume A"

The tag name 'A' is arbitrary and you can pick your own name or even suspend multiple requests at the same time.

Automated tests need to wait until a request has suspended. This can be done with the wait_break command:

(qemu) qemu-io ide-drive "wait_break A"

For another example of blkdebug breakpoints, see the qemu-iotests 046 test case.