Saturday, 29 July 2017

Tracing userspace static probes with perf(1)

The perf(1) tool added support for userspace static probes in Linux 4.8. Userspace static probes are pre-defined trace points in userspace applications. Application developers add them so frequently needed lifecycle events are available for performance analysis, troubleshooting, and development.

Static userspace probes are more convenient than defining your own function probes from scratch. You can save time by using them and not worrying about where to add probes because that has already been done for you.

On my Fedora 26 machine the QEMU, gcc, and nodejs packages ship with static userspace probes. QEMU offers probes for vcpu events, disk I/O activity, device emulation, and more.

Without further ado, here is how to trace static userspace probes with perf(1)!

Scan the binary for static userspace probes

The perf(1) tool needs to scan the application's ELF binaries for static userspace probes and store the information in $HOME/.debug/usr/:

# perf buildid-cache --add /usr/bin/qemu-system-x86_64

List static userspace probes

Once the ELF binaries have been scanned you can list the probes as follows:

# perf list sdt_*:*

List of pre-defined events (to be used in -e):

  sdt_qemu:aio_co_schedule                           [SDT event]
  sdt_qemu:aio_co_schedule_bh_cb                     [SDT event]
  sdt_qemu:alsa_no_frames                            [SDT event]

Let's trace something!

First add probes for the events you are interested in:

# perf probe sdt_qemu:blk_co_preadv
Added new event:
  sdt_qemu:blk_co_preadv (on %blk_co_preadv in /usr/bin/qemu-system-x86_64)

You can now use it in all perf tools, such as:

 perf record -e sdt_qemu:blk_co_preadv -aR sleep 1

Then capture trace data as follows:

# perf record -a -e sdt_qemu:blk_co_preadv
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 2.274 MB (4714 samples) ]

The trace can be printed using perf-script(1):

# perf script
 qemu-system-x86  3425 [000]  2183.218343: sdt_qemu:blk_co_preadv: (55d230272e4b) arg1=94361280966400 arg2=94361282838528 arg3=0 arg4=512 arg5=0
 qemu-system-x86  3425 [001]  2183.310712: sdt_qemu:blk_co_preadv: (55d230272e4b) arg1=94361280966400 arg2=94361282838528 arg3=0 arg4=512 arg5=0
 qemu-system-x86  3425 [001]  2183.310904: sdt_qemu:blk_co_preadv: (55d230272e4b) arg1=94361280966400 arg2=94361282838528 arg3=512 arg4=512 arg5=0

If you want to get fancy it's also possible to write trace analysis scripts with perf-script(1). That's a topic for another post but see the --gen-script= option to generate a skeleton script.

Current limitations

As of July 2017 there are a few limitations to be aware of:

Probe arguments are automatically numbered and do not have human-readable names. You will see arg1, arg2, etc and will need to reference the probe definition in the application source code to learn the meaning of the argument. Some versions of perf(1) may not even print arguments automatically since this feature was added later.

The contents of string arguments are not printed, only the memory address of the string.

Probes called from multiple call-sites in the application result in multiple perf probes. For example, if probe foo is called from 3 places you get sdt_myapp:foo, sdt_myapp:foo_1, and sdt_myapp:foo_2 when you run perf probe --add sdt_myapp:foo.

The SystemTap semaphores feature is not supported and such probes will not fire unless you manually set the semaphore inside your application or from another tool like GDB. This means that the sdt_myapp:foo will not fire if the application uses the MYAPP_FOO_ENABLED() macro like this: if (MYAPP_FOO_ENABLED()) MYAPP_FOO();.

Some history and alternative tools

Static userspace probes were popularized by DTrace's <sys/sdt.h> header. Tracers that came after DTrace implemented the same interface for compatibility.

On Linux the initial tool for static userspace probes was SystemTap. In fact, the <sys/sdt.h> header file on my Fedora 26 system is still part of the systemtap-sdt-devel package.

More recently the GDB debugger gained support for static userspace probes. See the Static Probe Points documentation if you want to use userspace static probes from GDB.


It's very handy to have static userspace probing available alongside all the other perf(1) tracing features. There are a few limitations to keep in mind but if your tracing workflow is based primarily around perf(1) then you can now begin using static userspace probes without relying on additional tools.

Thursday, 13 July 2017

Packet capture coming to AF_VSOCK

For anyone interested in the AF_VSOCK zero-configuration host<->guest communications channel it's important to be able to observe traffic. Packet capture is commonly used to troubleshoot network problems and debug networking applications. Up until now it hasn't been available for AF_VSOCK.

In 2016 Gerard Garcia created the vsockmon Linux driver that enables AF_VSOCK packet capture. During the course of his excellent Google Summer of Code work he also wrote patches for libpcap, tcpdump, and Wireshark.

Recently I revisited Gerard's work because Linux 4.12 shipped with the new vsockmon driver, making it possible to finalize the userspace support for AF_VSOCK packet capture. And it's working beautifully:

I have sent the latest patches to the tcpdump and Wireshark communities so AF_VSOCK can be supported out-of-the-box in the future. For now you can also find patches in my personal repositories:

The basic flow is as follows:

# ip link add type vsockmon
# ip link set vsockmon0 up
# tcpdump -i vsockmon0
# ip link set vsockmon0 down
# ip link del vsockmon0

It's easiest to wait for distros to package Linux 4.12 and future versions of libpcap, tcpdump, and Wireshark. If you decide to build from source, make sure to build libpcap first and then tcpdump or Wireshark. The libpcap dependency is necessary so that tcpdump/Wireshark can access AF_VSOCK traffic.

Monday, 6 February 2017

Slides posted for "Using NVDIMM under KVM" talk

I gave a talk on NVDIMM persistent memory at FOSDEM 2017. QEMU has gained support for emulated NVDIMMs and they can be used efficiently under KVM.

Applications inside the guest access the physical NVDIMM directly with native performance when properly configured. These devices are DDR4 RAM modules so the access times are much lower than solid state (SSD) drives. I'm looking forward to hardware coming onto the market because it will change storage and databases in a big way.

This talk covers what NVDIMM is, the programming model, and how it can be used under KVM. Slides are available here (PDF).

Update: Video is available here.