Thursday, April 30, 2020

How the Linux VFS, block layer, and device drivers fit together

The Linux kernel storage stack consists of several components including the Virtual File System (VFS) layer, the block layer, and device drivers. This article gives an overview of the main objects that a device driver interacts with and their relationships to each other. Actual I/O requests are not covered, instead the focus is on the objects representing the disk.

Let's start with a diagram of the key data structures and then an explanation of how they work together.

The Virtual File System (VFS) layer

The VFS layer is where file system concepts like files and directories are handled. The VFS provides an interface that file systems like ext4, XFS, and NFS implement to register themselves with the kernel and participate in file system operations. The struct file_operations interface is the most interesting for device drivers as we are about to see.

System calls like open(2), read(2), etc are handled by the VFS and dispatched to the appropriate struct file_operations functions.

Block device nodes like /dev/sda are implemented in fs/block_dev.c, which forms a bridge between the VFS and the Linux block layer. The block layer handles the actual I/O requests and is aware of disk-specific information like capacity and block size.

The main VFS concept that device drivers need to be aware of is struct block_device_operations and the struct block_device instances that represent block devices in Linux. A struct block_device connects the VFS inode and struct file_operations interface with the block layer struct gendisk and struct request_queue.

In Linux there are separate device nodes for the whole device (/dev/sda) and its partitions (/dev/sda1, /dev/sda2, etc). This is handled by struct block_device so that a partition has a pointer to its parent in bd_contains.

The block layer

The block layer handles I/O request queues, disk partitions, and other disk-specific functionality. Each disk is represented by a struct gendisk and may have multiple struct hd_struct partitions. There is always part0, a special "partition" covering the entire block device.

I/O requests are placed into queues for processing. Requests can be merged and scheduled by the block layer. Ultimately a device driver receives a request for submission to the physical device. Queues are represented by struct request_queue.

The device driver

The disk device driver registers a struct genhd with the block layer and sets up the struct request_queue to receive requests that need to be submitted to the physical device.

There is one struct genhd for the entire device even though userspace may open struct block_device instances for multiple partitions on the disk. Disk partitions are not visible at the driver level because I/O requests have already had their Logical Block Address (LBA) adjusted with the partition start offset.

How it all fits together

The VFS is aware of the block layer struct gendisk. The device driver is aware of both the block layer and the VFS struct block_device. The block layer does not have direct connections to the other components but the device driver provides callbacks.

One of the interesting aspects is that a device driver may drop its reference to struct gendisk but struct block_device instances may still have their references. In this case no I/O can occur anymore because the driver has stopped the disk and the struct request_queue, but userspace processes can still call into the VFS and struct block_device_operations callbacks in the device driver can still be invoked.

Thinking about this case is why I drew the diagram and ended up writing about this topic!