Saturday, 12 March 2011

How to write trace analysis scripts for QEMU

This post shows how to write a Python script that finds overlapping disk writes in a QEMU simple trace file.

Several trace backends, including SystemTap and LTTng Userspace Tracer, are supported by QEMU. The built in "simple" trace backend often gives the best bang for the buck because it does not require installing additional software and is easy to use for developers. See my earlier post for an overview of QEMU tracing.

The simple trace backend has recently been enhanced with a Python module for analyzing trace files. This makes it easy to write scripts that post-process trace files and extract useful information. Take this example from the commit that introduced the simpletrace module:

#!/usr/bin/env python
# Print virtqueue elements that were never returned to the guest.

import simpletrace

class VirtqueueRequestTracker(simpletrace.Analyzer):
    def __init__(self):
        self.elems = set()

    def virtqueue_pop(self, vq, elem, in_num, out_num):
        self.elems.add(elem)

    def virtqueue_fill(self, vq, elem, length, idx):
        self.elems.remove(elem)

    def end(self):
        for elem in self.elems:
            print hex(elem)

simpletrace.run(VirtqueueRequestTracker())

This script tracks virtqueue_pop and virtqueue_fill operations and prints out the elements that were popped but never filled back, which indicates elements have leaked.

The model of an analysis script is similar to awk. Trace records are processed from the input file by invoking methods on the user's simpletrace.Analyzer object. The analyzer object does not have to supply methods for all possible trace events, it can just implement those that it wants to know about. Trace events that have no dedicated method cause the catchall() method to be invoked, if provided.

Tracing disk write operations


Let's write a slightly fancier script that finds disk writes that overlap a given range. I've needed to perform disk write overlap queries in the past when debugging image formats, so this is a useful script to have. QEMU's block layer write function looks like this:

BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
                                  QEMUIOVector *iov, int nb_sectors,
                                  BlockDriverCompletionFunc *cb, void *opaque);

The block device is called bs and sectors are 512 bytes. Conveniently there is already a trace event for bdrv_aio_write so we just need to enable it in the trace-events file:

disable bdrv_aio_writev(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"

Open the trace-events file and remove the disable keyword from the bdrv_aio_write trace event. Then rebuild QEMU like this:

$ ./configure --enable-trace-backend=simple # ... plus your usual options
$ make

Next time you run QEMU a trace file named trace-$PID will be created in the current working directory. The file is in binary and can be parsed using the simpletrace Python module or pretty-printed like this:

$ scripts/simpletrace.py trace-events trace-12345 # replace "trace-12345" with the actual filename

Finding overlapping disk writes


Here is the usage information for a script to find disk writes that overlap a given range:

usage: find_overlapping_writes.py <trace-events> <trace-file> <bs> <sector-num> <nb-sectors>

The script only considers writes to a specific block device, bs. That means all disk I/O to other block devices is ignored.

For example, let's find writes to a 1 MB region at offset 2 MB of the BlockDriverState 0x29c8180:
$ ./find_overlapping_writes.py trace-events trace-19129 0x29c8180 4096 2048
0xe40+664 opaque=0x2eb4190
0x10d8+3032 opaque=0x2eb41d0

Here is the code:
#!/usr/bin/env python
import sys
import simpletrace

def intersects(a_start, a_len, b_start, b_len):
    return not (a_start + a_len <= b_start or \
                b_start + b_len <= a_start)

class OverlappingWritesAnalyzer(simpletrace.Analyzer):
    def __init__(self, bs, sector_num, nb_sectors):
        self.bs = bs
        self.sector_num = sector_num
        self.nb_sectors = nb_sectors

    def bdrv_aio_writev(self, bs, sector_num, nb_sectors, opaque):
        if bs != self.bs:
            return
        if intersects(self.sector_num, self.nb_sectors, sector_num, nb_sectors):
            print '%#x+%d opaque=%#x' % (sector_num, nb_sectors, opaque)

if len(sys.argv) != 6:
    sys.stderr.write('usage: %s <trace-events> <trace-file> <bs> <sector-num> <nb-sectors>\n' %
                     sys.argv[0])
    sys.exit(1)

trace_events, trace_file, bs, sector_num, nb_sectors = sys.argv[1:]
bs = int(bs, 0)
sector_num = int(sector_num, 0)
nb_sectors = int(nb_sectors, 0)

analyzer = OverlappingWritesAnalyzer(bs, sector_num, nb_sectors)
simpletrace.process(trace_events, trace_file, analyzer)

The core of the script is the OverlappingWritesAnalyzer that checks bdrv_aio_writev events for intersection with the given range.

This script is longer than the virtqueue leak detector example above because it parses command-line arguments. The simpletrace.run() function used by the leak detector handles the default trace events and trace file arguments for you. So scripts that take no special command-line arguments can use simpletrace.run(), which also prints usage information automatically. For overlapping writes we really need our own command-line arguments so the slightly lower-level simpletrace.process() function is used.

Where to find out more


There is more information about the simpletrace module in the doc comments, so the simplest way to get started is:
$ cd qemu
$ PYTHONPATH=scripts python
>>> import simpletrace
>>> help(simpletrace)

Another example of how to use the simpletrace module is the trace file pretty-printer which is included as part of the scripts/simpletrace.py source code itself!

Feel free to leave questions or comments about the simple trace backend and the simpletrace Python module.

18 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. Hi Stefan, thanks a lot for your explanation on everything regarding QEMU.
    I am trying to do some instrumentation on QEMU io operations using systemtap as the back end. But I couldn't find any good examples or documents which explains the available probe points in QEMU for using system tap. Can you please provide me some input on where to get started with systemtap tracing on QEMU or any available documents online?

    -Toms

    ReplyDelete
    Replies
    1. Hi Toms,
      See the ./trace-events file in the QEMU source tree. A SystemTap tapset is built from this file (e.g. qemu-system-x86_64.stp) and provides probes your scripts can use.

      For general technical questions about QEMU, please email qemu-devel@nongnu.org.

      Delete
    2. Hi Stefan,
      Thanks a lot for the help. I got the probe points. But currently, I am stuck trying to probe qemu.system.x86_64.bdrv_aio_readv event in qemu. This probe is not giving any output. None of the available qemu probes are giving any output. I am using libvirt to manage the VMs and hence, it is executing qemu-system-x86_64 and system tap is unable trace an already running process directly I guess. Is there anyway to use system tap on an already running qemu instance? Otherwise, can we get the qemu command line instruction from libvirt so that I can execute the VM (qemu) directly as part of my systemtap script using -c flag?
      I am basically trying to monitor the accesses to host file which is used as a guest hard disk. Hence, trying to probe bdrv_aio_readv().

      - Toms

      Delete
    3. SystemTap can probe running processes. I'm afraid you'll just have to troubleshoot it or look at examples in the SystemTap manual. Make sure the probe name exactly matches the name in the tapset.

      Delete
    4. Thanks a lot for your help. I got it all working. The qemu executable path in the tapset file was wrong :)

      - Toms

      Delete
  3. Hi Stefan, I have another question. Where does the output of printf statements in QEMU source go? I assume that it goes to the terminal which executes qemu. But since I am using libvirt to manage my VMs, I have no idea about the exact qemu command used along with its arguments. In such a case, is there any way to get the output of these printfs?

    - Toms

    ReplyDelete
    Replies
    1. Please send general questions to the QEMU mailing list at qemu-devel@nongnu.org since they are not directly related to this blog post and could benefit from discussion in the QEMU community.

      Use fprintf to stderr for warnings/errors/debug output. Libvirt keeps the stderr output at /var/lib/libvirt/qemu/.log.

      Delete
    2. Thanks for the reply. I will ask the rest to the email id you mentioned.

      Delete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Good informative post, thanks for sharing.

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. A Question.
    Hello Stefan, Just gut to know your blog, very impressive and for sure I will spend time to get myself some quality time on understanding the underlying of the virtual world.
    I have yet to gone through enough,
    env: virsh start VM
    emulator: qemu-system-X86_64.
    goal: trace linux via simple trace.
    xml args:
    qemu:arg value='-trace'
    qemu:arg value='events=/home/qemu_traced_symbols'/>

    Quesions:
    1. I am only getting one trace.{pid} for main qemu listener and not for each VCPU
    2. while trying to read this trace I am getting.
    run(Formatter())
    File "./simpletrace.py", line 169, in run
    events = _read_events(open(sys.argv[1], 'r'))
    File "tracetool/__init__.py", line 291, in _read_events
    event = Event.build(line)
    File "tracetool/__init__.py", line 220, in build
    assert m is not None
    AssertionError

    what am I possibly doing wrong?
    Regards,
    Nir.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. > 1. I am only getting one trace.{pid} for main qemu listener and not for each VCPU

      The traces from all threads are contained in the file. This includes the main loop thread, vcpus, and other threads.

      > File "tracetool/__init__.py", line 220, in build
      > assert m is not None
      > AssertionError

      Looks like your trace-events-all (previously named just trace-events) file contains invalid syntax or you passed the wrong file. It must be invoked like this:

      $ scripts/simpletrace.py <trace-events-all> <trace-file>

      I have tested that qemu.git/master (ca3d87d4c8) works:

      $ cat my-events
      qemu_memalign
      $ x86_64-softmmu/qemu-system-x86_64 -trace events=my-events
      $ scripts/simpletrace.py trace-events-all trace-12190
      qemu_memalign 0.000 pid=12190 alignment=0x40 size=0x80000 ptr=0x7fccbb604040
      qemu_memalign 13230.593 pid=12190 alignment=0x200 size=0x200 ptr=0x5590b3109e00
      ...

      If you would like help, please post:
      1. The qemu.git commit SHA1 that your tree is based off.
      2. The simpletrace.py command-line you used.
      3. The contents of the trace-events-all file (previously named just trace-events).

      Delete
    3. Thank Stefan.

      Indeed I have used the syntax incorect and placed my /home/qemu_traced_symbols instead of the trace-events located at scripts folder.

      now I see the trace and learn it.
      I wonder if the delta times can be replaced with current ticks.
      current goal: trace qemu-kvm latencies.

      Appreciate your time and support.
      Nir.

      Delete