Wednesday, 23 February 2011

Observability using QEMU tracing

I am going to describe the tracing feature in the QEMU and KVM.

Overview of QEMU tracing


Tracing is available for the first time in QEMU 0.14.0 and qemu-kvm 0.14.0. It's an optional feature and may not be enabled in distro packages yet, but it's there if you are willing to build from source.

QEMU tracing is geared towards answering questions about running virtual machines:
  • What I/O accesses are being made to emulated devices?
  • How long are disk writes taking to complete inside QEMU?
  • Is QEMU leaking memory or other resources by not freeing them?
  • Are network packets being received but filtered at the QEMU level?

In order to find answers to these questions we place trace events into the QEMU source code at strategic points. For example, every qemu_malloc() and qemu_free() call can be traced so we know what heap memory allocations are going on.

Current status


Today QEMU tracing is useful to developers and anyone troubleshooting or investigating bugs.

The set of trace events that comes with QEMU is limited but already useful for observing the block layer and certain emulated hardware. Developers are adding trace events to new code and converting some existing debug printfs to trace events. I expect the default set of trace events to grow and become more useful in the future.

Trace events are currently not a stable API so scripts that work with one version of QEMU are not guaranteed to work with another version. There is also no documentation on the semantics of particular trace events, so it is necessary to understand the code which contains the trace event to know its meaning. In the future we can make stable trace events with explicit semantics like "packet received from host".

QEMU tracing cross-platform support


You have a choice of trace backends: SystemTap, LTTng Userspace Tracer, and a built-in "simple" tracer are supported. DTrace could be added with little effort on Solaris, Mac OSX, and FreeBSD host platforms.

The available set of trace events is the same no matter which trace backend you choose.

Where to find out more


If you want to get started, check out the documentation that comes are part of QEMU.

Also check out the excellent QEMU 0.14.0 changelog for pointers related to tracing.

I looking forward to writing more about tracing in the future and sharing trace analysis scripts. In fact, I just submitted a patch to provide a Python API for processing trace files generated by the "simple" trace backend. It makes analyzing trace files quick and fun :).

15 comments:

  1. Hello Stefan, its a very useful post. Thanks for sharing this. I wanted to know how can one trace what happens when an instruction like "INVLPG" is executed by guest OS? How exactly can one trace what code path is executed and what are the tools required to trace such processor level instruction executed by guest OS on top of QEMU?

    ReplyDelete
    Replies
    1. It depends on the instruction. Many instructions are not trapped by the KVM kernel module or TCG generated code. Therefore you cannot trace them easily because they are executed directly by the host CPU. The advantage is that they execute quickly because there is no overhead.

      You need to investigate each instructions you are interested in. Check the QEMU source code and check the KVM kernel module source code (if you are using KVM). In the KVM case you may also want to check the Intel manuals to understand whether the instruction traps out of guest mode.

      Delete
    2. Thanks for the quick response. But what if QEMU is used without KVM support? how can one figure out the code flow for an instruction like "INVLPG"? I looked up the QEMU source code and could see that "INVLPG" was being handled in target-i386/translate.c and target-i386/misc_helper.c (helper_invlpg function) but could not get the exact code path to this function even after attaching GDB with QEMU.

      Delete
    3. "Helper" means a function that is called from generated code. So you could add a trace event to helper_invlpg() and it fires on every guest invlpg instruction.

      Delete
    4. Great. So one can use SystemTap to get a trace for helper_invlpg() and can also get a call sequence for the same. Thanks a lot for this information.

      Delete
  2. hello stefan,
    I want to trace calls to neon helpers in ARM backend. I have the linaro/qemu git. I have a c code that generates loads of vadd.i32 instructions. How is it possible to know which function is used to translate the instruction in ARM backend?

    ReplyDelete
    Replies
    1. Please email qemu-devel@nongnu.org and CC Lluís Vilanova .

      Delete
  3. Hello Stefan,

    I wanted to setup a ftrace based mechanism to understand flows within the guest and host simultaneously. Are there mechanisms available to get traces from guest and host and then collate them to get a merged trace which helps to understand exact flow of guest with host ? OR generate a trace file which by default combines the traces from running guest and host. How can I setup such an environment, what qemu options can I specify for this to work.

    Thanks

    ReplyDelete
    Replies
    1. I don't think there is an out-of-the-box guest+host tracing solution but you can combine guest and host ftrace (based on the timestamps).

      Delete
    2. Hi,
      take a look to that link.
      http://link.springer.com/article/10.1186/s13677-014-0023-3

      Delete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Hi stefan,
    Do you have some results about the latency caused by QEMU tracing ?

    ReplyDelete
    Replies
    1. QEMU supports several tracers including SystemTap, ftrace, LTTng UST, and logging. QEMU itself doesn't really add overhead but these tracers have different overheads.

      For example, SystemTap static probes are implemented using a software interrupt/breakpoint that transfers control to the kernel on each trace event. This is more heavyweight than the LTTng UST tracer's shared memory mechanism.

      If you are concerned about tracing overhead you can compare the difference it makes to your benchmark or application performance by building QEMU with different tracers (e.g ./configure --enable-trace-backends=dtrace vs ./configure --enable-trace-backends=ftrace).

      Delete
  6. hi stefan ,
    do you have any suggestion about trase instructions that run without operating system on qemu

    ReplyDelete
    Replies
    1. If you want to observe guest instruction execution you can use the GDB stub (see -gdb and -s command-line options) or the TCG debug logs (see -d command-line option). They are documented on the qemu man page.

      Delete