Saturday, February 15, 2020

An introduction to GDB scripting in Python

Sometimes it's not humanly possible to inspect or modify data structures manually in a debugger because they are too large or complex to navigate. Think of a linked list with hundreds of elements, one of which you need to locate. Finding the needle in the haystack is only possible by scripting the debugger to automate repetitive steps.

This article gives an overview of the GNU Debugger's Python scripting support so that you can tackle debugging tasks that are not possible manually.

What scripting GDB in Python can do

GDB can load Python scripts to automate debugging tasks and to extend debugger functionality. I will focus mostly on automating debugging tasks but extending the debugger is very powerful though rarely used.

Say you want to search a linked list for a particular node:

(gdb) p node.next
...
(gdb) p node.next.next
...
(gdb) p node.next.next.next

Doing this manually can be impossible for lists with too many elements. GDB scripting support allows this task to be automated by writing a script that executes debugger commands and interprets the results.

Loading Python scripts

The source GDB command executes files ending with the .py extension in a Python interpreter. The interpreter has access to the gdb Python module that exposes debugging APIs so your script can control GDB.

$ cat my-script.py
print('Hi from Python, this is GDB {}'.format(gdb.VERSION))
$ gdb
(gdb) source my-script.py
Hi from Python, this is GDB Fedora 8.3.50.20190824-28.fc31

Notice that the gdb module is already imported. See the GDB Python API documentation for full details of this module.

It's also possible to run ad-hoc Python commands from the GDB prompt:

(gdb) py print('Hi')
Hi

Executing commands

GDB commands are executed using gdb.execute(command, from_tty, to_string). For example, gdb.execute('step') runs the step command. Output can be collected as a Python string by setting to_string to True. By default output goes to the interactive GDB session.

Although gdb.execute() is fundamental to GDB scripting, at best it allows screen-scraping (interpreting the output string) rather than a Pythonic way of controlling GDB. There is actually a full Python API that represents the debugged program's types and values in Python. Most scripts will use this API instead of simply executing GDB commands as if simulating an interactive shell session.

Navigating program variables

The entry point to navigating program variables is gdb.parse_and_eval(expression). It returns a gdb.Value.

When a gdb.Value is a struct its fields can be indexed using value['field1']['child_field1'] syntax. The following example iterates a linked list:

elem = gdb.parse_and_eval('block_backends.tqh_first')
while elem:
    name = elem['name'].string()
    if name == 'drive2':
        print('Found {}'.format(elem['dev']))
        break
    elem = elem['link']['tqe_next']

This script iterates the block_backends linked list and checks the name field of each element against "drive2". When it finds "drive2" it prints the dev field of that element.

There is a lot more that GDB Python scripts can do but you'll have to check out the API documentation to learn about it.

Conclusion

Python scripts can automate tedious debugging tasks in GDB. Having the full power of Python and access to file I/O, HTTP requests, etc means pretty much any debugging task can be turned into a full-blown program. A subset of this was possible in the past through GDB command scripts, but Python is a much more flexible programming language and familiar to many developers (more so than GDB's own looping and logic commands!).