Friday, August 14, 2015

Asynchronous file I/O on Linux: Plus ça change

In 2009 Anthony Liguori gave a presentation at Linux Plumbers Conference about the state of asynchronous file I/O on Linux. He talked about what was missing from POSIX AIO and Linux AIO APIs. I recently got thinking about this again after reading the source code for the io_submit(2) system call.

Over half a decade has passed and plus ça change, plus c'est la même chose. Sure, there are new file systems, device-mapper targets, the multiqueue block layer, and high IOPS PCI SSDs. There's DAX for storage devices accessible via memory load/store instructions - radically different from the block device model.

However, the io_submit(2) system call remains a treacherous ally in the quest for asynchronous file I/O. I don't think much has changed since 2009 in making Linux AIO the best asynchronous file I/O mechanism.

The main problem is that io_submit(2) waits for I/O in some cases. It can block! This defeats the purpose of asynchronous file I/O because the caller is stuck until the system call completes. If called from a program's event loop, the program becomes unresponsive until the system call returns. But even if io_submit(2) is invoked from a dedicated thread where blocking doesn't matter, latency is introduced to any further I/O requests submitted in the same io_submit(2) call.

Sources of blocking in io_submit(2) depend on the file system and block devices being used. There are many different cases but in general they occur because file I/O code paths contain synchronous I/O (for metadata I/O or page cache write-out) as well as locks/waiting (for serializing operations). This is why the io_submit(2) system call can be held up while submitting a request.

This means io_submit(2) works best on fully-allocated files, volumes, or block devices. Anything else is likely to result in blocking behavior and cause poor performance.

Since these conditions don't apply in many cases, QEMU has its own userspace thread-pool with worker threads that call preadv(2)/pwritev(2). It would be nice to default to Linux AIO but the limitations are too serious.

Have there been new developments or did I get something wrong? Let me know in the comments.