Saturday, February 15, 2020

An introduction to GDB scripting in Python

Sometimes it's not humanly possible to inspect or modify data structures manually in a debugger because they are too large or complex to navigate. Think of a linked list with hundreds of elements, one of which you need to locate. Finding the needle in the haystack is only possible by scripting the debugger to automate repetitive steps.

This article gives an overview of the GNU Debugger's Python scripting support so that you can tackle debugging tasks that are not possible manually.

What scripting GDB in Python can do

GDB can load Python scripts to automate debugging tasks and to extend debugger functionality. I will focus mostly on automating debugging tasks but extending the debugger is very powerful though rarely used.

Say you want to search a linked list for a particular node:

(gdb) p node.next
...
(gdb) p node.next.next
...
(gdb) p node.next.next.next

Doing this manually can be impossible for lists with too many elements. GDB scripting support allows this task to be automated by writing a script that executes debugger commands and interprets the results.

Loading Python scripts

The source GDB command executes files ending with the .py extension in a Python interpreter. The interpreter has access to the gdb Python module that exposes debugging APIs so your script can control GDB.

$ cat my-script.py
print('Hi from Python, this is GDB {}'.format(gdb.VERSION))
$ gdb
(gdb) source my-script.py
Hi from Python, this is GDB Fedora 8.3.50.20190824-28.fc31

Notice that the gdb module is already imported. See the GDB Python API documentation for full details of this module.

It's also possible to run ad-hoc Python commands from the GDB prompt:

(gdb) py print('Hi')
Hi

Executing commands

GDB commands are executed using gdb.execute(command, from_tty, to_string). For example, gdb.execute('step') runs the step command. Output can be collected as a Python string by setting to_string to True. By default output goes to the interactive GDB session.

Although gdb.execute() is fundamental to GDB scripting, at best it allows screen-scraping (interpreting the output string) rather than a Pythonic way of controlling GDB. There is actually a full Python API that represents the debugged program's types and values in Python. Most scripts will use this API instead of simply executing GDB commands as if simulating an interactive shell session.

Navigating program variables

The entry point to navigating program variables is gdb.parse_and_eval(expression). It returns a gdb.Value.

When a gdb.Value is a struct its fields can be indexed using value['field1']['child_field1'] syntax. The following example iterates a linked list:

elem = gdb.parse_and_eval('block_backends.tqh_first')
while elem:
    name = elem['name'].string()
    if name == 'drive2':
        print('Found {}'.format(elem['dev']))
        break
    elem = elem['link']['tqe_next']

This script iterates the block_backends linked list and checks the name field of each element against "drive2". When it finds "drive2" it prints the dev field of that element.

There is a lot more that GDB Python scripts can do but you'll have to check out the API documentation to learn about it.

Conclusion

Python scripts can automate tedious debugging tasks in GDB. Having the full power of Python and access to file I/O, HTTP requests, etc means pretty much any debugging task can be turned into a full-blown program. A subset of this was possible in the past through GDB command scripts, but Python is a much more flexible programming language and familiar to many developers (more so than GDB's own looping and logic commands!).

Monday, February 10, 2020

Video for "virtio-fs: a shared file system for virtual machines" at FOSDEM '20 now available

The video and slides from my virtio-fs talk at FOSDEM '20 are now available!

virtio-fs is a shared file system that lets guests access a directory on the host. It can be used for many things, including secure containers, booting from a root directory, and testing code inside a guest.

The talk explains how virtio-fs works, including the Linux FUSE protocol that it's based on and how FUSE concepts are mapped to VIRTIO.

virtio-fs guest drivers have been available since Linux v5.4 and QEMU support will be available from QEMU v5.0 onwards.

Video (webm) (mp4)

Slides (PDF)

Sunday, February 9, 2020

Why CPU Utilization Metrics are Confusing

How much CPU is being used? Intuitively we would like to know the percentage of time being consumed. Popular utilities like top(1) and virt-top(1) do show percentages but the numbers can be weird. This post goes into how CPU utilization is accounted and why the numbers can be confusing.

Tools sometimes show CPU utilizations above 100%. Or we know a virtual machine is consuming all its CPU but only 12% CPU utilization is reported. Comparing CPU utilization metrics from different tools often reveals that the numbers they report are wildly different. What's going on?

How CPU Utilization is Measured

Imagine we want to measure the CPU utilization of an application on a simple computer with one CPU. Each time the application is scheduled on the CPU we record the time until it is next descheduled. The utilization is calculated by dividing the total CPU time that the application ran by the time interval being measured:

Here t is execution time for each of the n times the application was scheduled and T is the time unit being measured (e.g. 1 second).

So far, so good. This is how CPU utilization times should work. Now let's look at why the percentages can be confusing.

CPU Utilization on Multi-Processor Systems

Modern computers from mobile phones to laptops to servers typically have multiple logical CPUs. They are called logical CPUs because they appear as a CPU to software regardless of whether they are implemented as a socket, a core, or an SMT hardware thread.

On multi-processor systems we need to adapt the CPU utilization formula to account for CPUs running in parallel. There are two ways to do this:

  1. Treat 100% as full utilization of all CPUs. top(1) calls this Solaris mode.
  2. Treat 100% as full utilization of one CPU. top(1) calls this Irix mode.

By default top(1) reports CPU utilization in Irix mode and virt-top(1) reports Solaris mode.

The implications of Solaris mode are that a single CPU being fully utilized is only reported as 1/N CPU utilization where N is the number of CPUs. On a system with a large number of CPUs the utilization percentages can be very low even though some CPUs are fully utilized. Even on my laptop with 4 logical CPUs that means a single-threaded application consuming a full CPU only reports 25% CPU utilization.

Irix mode produces more intuitive 0-100% numbers for single-threaded applications but multi-threaded applications may consume multiple CPUs and therefore exceed 100%, which looks a bit funny.

Confused?

Since there are two ways of accounting CPU utilization on multi-processor systems it is always necessary to know which method is being used. A percentage on its own is meaningless and might be misinterpreted.

This also explains why numbers reported by different tools can be so vastly different. It is necessary to check which accounting method is being used by both tools.

Documentation (and source code) often sheds light on which accounting method is used, but another way to check is by running a process that consumes a full CPU and then observing the CPU utilization that is reported. This can be done by running while true; do true; done in a shell and checking the CPU utilization numbers that are reported.

virt-top(1) has another peculiarity that must be taken into account. Its formula divides CPU time consumed by a guest by the total CPU time available on the host. If the guest has 4 vCPUs but the guest has 8 physical CPUs, then the guest can only ever reach 50% because it will never use all physical CPUs at once.

Conclusion

CPU utilization can be confusing on multi-processor systems, which is most computers today. Interpreting CPU utilization metrics requires knowing whether Solaris mode or Irix mode was used for calculation. Be careful with CPU utilization metrics!

Friday, February 7, 2020

Apply for QEMU Outreachy 2020 May-August internships now!

QEMU is participating in the Outreachy open source internship program again this year. Check out the QEMU blog for more information about this 12-week full-time, paid, remote work internship working on QEMU.