Sunday, June 20, 2021

My performance benchmarking workflow (2021)

Benchmarking computer systems is time-consuming because setting up the necessary environment involves a lot of work. Over time I have built a workflow that mitigates the cost of setting up benchmarks and allows me to analyze performance more effectively. This blog post covers my workflow as of 2021.

Performance investigations often follow these steps:

  1. Set up hardware and software.
  2. Run initial benchmarks to verify that the bottleneck under investigation is being triggered.
  3. Collect a full set of benchmark results and monitoring data.
  4. Analyze results and form a hypothesis about the bottleneck.
  5. Implement a proof-of-concept optimization to test the hypothesis.
  6. Go to Step 3 until the desired benchmark results are reached, keeping those optimizations that helped.

This is a long process that is costly to pause/resume or replicate again in the future. Setting up hardware and software manually is both time-consuming and error-prone. Therefore we don't want to do it more than once. There is a risk that replicating the benchmark on another machine will fail to produce identical results due to differences in environments.

The consequence of high-overhead processes is that we minimize their use since we cannot afford to run through the process as often as we'd like. This means we cannot answer all the performance questions we'd like to and therefore our understanding is limited. We cannot discover all the truths that would enable us to make performance improvements.

A more lightweight process would encourage experimentation and lead to higher productivity.

An ideal workflow

In a low-overhead world I would like to do the following:

  1. Set up hardware and software once only and be able to return to that state again in the future at the press of a button.
  2. Capture the full benchmarking environment so the configuration can be inspected and modified easily.
  3. Store benchmark results so that each run is available for further analysis in the future.

The workflow is actually quite similar to developing code with git:

  1. Create a topic branch for this performance investigation.
  2. Add an environment definition to produce the desired hardware and software state.
  3. Run the benchmark and collect the results.
  4. Commit the environment and results.
  5. Go to Step 2 to modify the environment (e.g. apply proof-of-concept patches to software) and repeat.

Since the performance investigation is captured in a git branch it's easy to switch to another investigation without losing history.

This is actually what I do! Git provides the storage and time machine functionality for easily pausing/resuming or replicating performance investigations.

Ansible

Ansible provides the automation system necessary to put hardware and software into the desired state for benchmarking. Ansible's killer feature is the large ecosystem of modules for handling tasks like installing packages, configuring virtual machines and containers, etc. I find Ansible more productive than Python or shell scripting thanks to Ansible's modules collection.

I've begun collecting Ansible tasks for Linux KVM development in virt-tasks. If you're wondering what the configuration for running a benchmark looks like, here is an Ansible playbook that builds QEMU and a guest kernel, creates a Fedora 34 virtual machine, runs the fio disk I/O benchmark, and collects the results:

---
- hosts: hosts
  tasks:
    - include_tasks: tasks/build-qemu.yml
      vars:
        - repo: https://gitlab.com/qemu-project/qemu.git
        - version: v6.0.0

    - name: create disk image
      include_tasks: tasks/virt-builder-create-image.yml
      vars:
        - os_version: fedora-34
        - size: 32G
        - output: /var/lib/libvirt/images/test.img
        - format: raw

    - name: build guest kernel
      include_tasks: tasks/build-kernel.yml
      vars:
        - repo: https://gitlab.com/stefanha/linux.git
        - version: cpuidle-haltpoll-virtqueue
        - config_src_path: files/.config

    - name: stop vm
      virt:
        name: test
        state: shutdown
      ignore_errors: yes

    - name: start vm
      include_tasks: tasks/start-vm.yml
      vars:
        - xml: "{{ lookup('file', 'files/test.xml') }}"
        - host: 192.168.122.192

- hosts: vms
  tasks:
    - name: install fio and rsync
      dnf:
        state: present
        name:
          - fio
          - rsync

    - name: run fio
      script: files/fio.sh

    - name: fetch fio output files
      synchronize:
        src: fio-output/
        dest: notebook/fio-output/poll_source-off
        mode: pull

    - name: run fio
      script: files/fio.sh --enable

    - name: fetch fio output files
      synchronize:
        src: fio-output/
        dest: notebook/fio-output/poll_source-on
        mode: pull

- hosts: hosts
  tasks:
    - name: stop vm
      virt:
        name: test
        state: shutdown

An important point is that the Ansible playbook sets up the full environment, runs all benchmarks, and collects the results. Unlike a playbook that runs a single benchmark this one runs the full suite so that the playbook captures the entire environment that produced the results. This distinction is important when tweaking the benchmark configuration or trying out proof-of-concept optimizations. Each git commit needs to encompass the full environment so that the performance investigation is reproducible and can be resumed in the future.

Jupyter

I recently started using JupyterLab notebooks for data analysis. It provides a convenient environment for graphing results and organizing them in documents.

Thanks to the official Jupyter container images you can get a full Python data analysis environment running with just one command:

$ podman run --userns=keep-id -e JUPYTER_ENABLE_LAB=yes -p 8888:8888 --rm -v "$PWD":/home/jovyan/work:z jupyter/scipy-notebook

So far I have only scratched the surface of JupyterLab. It works well for visualizing data although the way I currently use it is not much different from writing a Python matplotlib script and running it from the command-line. In time I'll get a better appreciation for its strengths and weaknesses.

Conclusion

A git-based workflow that automates benchmark setup and stores the results goes a long way towards mitigating the high overhead of performance investigations. If I'm interrupted or need to switch to a different machine it's easy to resume the investigation. Having the results and details of the environment stored together makes it possible to revisit benchmark runs in the future to reproduce or tweak them. The combination of git, Ansible, and Jupyter achieves this workflow quite well, but if you're familiar with other tools I'd love to hear!