Saturday, January 10, 2026

Building a virtio-serial FPGA device (Part 4): Virtqueue processing

This is the fourth post in a series about building a virtio-serial device in Verilog for an FPGA development board. This time we'll look at processing the virtio-serial device's transmit and receive virtqueues.

Series table of contents

  1. Part 1: Overview
  2. Part 2 - MMIO registers, DMA, and interrupts
  3. Part 3 - virtio-serial device design
  4. Part 4 - Virtqueue processing (you are here)
  5. Part 5 - UART receiver and transmitter
  6. Part 6 - Writing the RISC-V firmware

The code is available at https://gitlab.com/stefanha/virtio-serial-fpga.

The virtio-serial device has a pair of virtqueues that allow the driver to transmit and receive data. The driver enqueues empty buffers onto the receiveq (virtqueue 0) and the device fills them with received data. The driver enqueues buffers containing data onto the transmitq (virtqueue 1) and the device sends them.

This logic is split into two modules: virtqueue_reader for the transmitq and virtqueue_writer for the receiveq. The interface of virtqueue_reader looks like this:

/* Stream data from a virtqueue without framing */
module virtqueue_reader (
    input clk,
    input resetn,

    /* Number of elements in descriptor table */
    input [15:0] queue_size,
    /* Lower 32-bits of Virtqueue Descriptor Area address */
    input [31:0] queue_desc_low,
    /* Lower 32-bits of Virtqueue Driver Area address */
    input [31:0] queue_driver_low,
    /* Lower 32-bits of Virtqueue Device Area address */
    input [31:0] queue_device_low,
    input queue_notify,            /* kick */

    input phase,
    output reg [31:0] data = 0,
    output reg [2:0] data_len = 0,
    output ready,

    /* For DMA */
    output reg ram_valid = 0,
    input ram_ready,
    output reg [3:0] ram_wstrb = 0,
    output reg [21:0] ram_addr = 0,
    output reg [31:0] ram_wdata = 0,
    input [31:0] ram_rdata
);

If you are familiar with the VIRTIO specification you might recognize queue_size, queue_desc_low, queue_driver_low, queue_device_low, and queue_notify since they are values provided by the VIRTIO MMIO Transport. The driver configures them with the memory addresses of the virtqueue data structures in RAM. The device will DMA to access those data structures. The driver can kick the device to indicate that new buffers have been enqueued using queue_notify.

The reader interface consists of phase, data, data_len, and ready and this is what the rdwr_stream module needs to use virtqueue_reader as a data source. rdwr_stream will keep reading the next byte(s) by flipping the phase bit and waiting for ready to be asserted by the device. Note that the device can provide up to 4 bytes at a time through the 32-bit data register and data_len allows the device to indicate how much data was read.

Finally, the DMA interface is how virtqueue_reader initiates RAM accesses so it can fetch the virtqueue data structures that the driver has configured.

The state machine

Virtqueue processing consists of multiple steps and cannot be completed within a single clock cycle. Therefore the processing is decomposed into a state machine where each step consists of a DMA transfer or waiting for an event. Here are the states:

`define STATE_WAIT_PHASE 0                 /* waiting for phase bit to flip */
`define STATE_READ_AVAIL_IDX 1             /* waiting for avail.idx read */
`define STATE_WAIT_NOTIFY 2                /* waiting for queue notify (kick) */
`define STATE_READ_DESCRIPTOR_ADDR_LOW 3   /* waiting for descriptor read */
`define STATE_READ_DESCRIPTOR_LEN 4
`define STATE_READ_DESCRIPTOR_FLAGS_NEXT 5
`define STATE_READ_BUFFER 6                /* waiting for data buffer read */
`define STATE_WRITE_USED_ELEM_ID 7         /* waiting for used element write */
`define STATE_WRITE_USED_ELEM_LEN 8
`define STATE_WRITE_USED_FLAGS_IDX 9       /* waiting for used.flags/used.idx write */
`define STATE_READ_AVAIL_RING_ENTRY 10     /* waiting for avail element read */

The device starts up in STATE_WAIT_PHASE because it is waiting to be asked to read the first byte(s). As soon as rdwr_stream flips the phase bit, virtqueue_reader must check the virtqueue to see if any data buffers are available.

I won't describe all the details of virtqueue processing, but here is a summary of the steps involved. See the VIRTIO specification or the code for the details.

  1. Read the avail.idx field from RAM in case the driver has enqueued more buffers.
  2. Read the avail.ring[i] entry from RAM to fetch the descriptor table index of the next available buffer.
  3. Read the descriptor from RAM to find out the buffer address and length.
  4. Repeatedly read bytes from the buffer until the current descriptor is empty. If the descriptor is chained, read the next descriptor from RAM and repeat.
  5. If the chain is finished, check avail.idx again in case there are more buffers available.

After a buffer has been fully consumed, there are also several steps to fill out a used descriptor and increment the used.idx field so that the driver is aware that the buffer is done.

There are two wait states when the device stops until it there is more work to do. First, rdwr_stream will stop asking to read more data if the writer is too slow. This flow control ensures that data is not dropped due to a slow writer. This is STATE_WAIT_PHASE. Second, if the device wants to read but the virtqueue is empty, then it has to wait until queue_notify goes high. This is STATE_WAIT_NOTIFY.

The virtqueue_writer module is similar to virtqueue_reader but it fills in the buffers with data instead of consuming them.

A quick side note about memory alignment: the memory interface is 32-bit aligned, so it is only possible to read an entire 32-bit value from memory at multiples of 4 bytes. On a fancier CPU with a cache the unit would be a cache line (e.g. 128 bytes). When the data structures being DMAed are not aligned it becomes tedious to handle the shifting and masking, especially when reading data from a source and writing it to a destination. Life is much simpler when everything is aligned, because data can be trivially read or written in a single access without any special logic to adjust the data to fit the cache line size.

Conclusion

The virtqueue_reader and virtqueue_writer modules use DMA to read or write data from/to RAM buffers provided by the driver running on the PicoRV32 RISC-V CPU inside the FPGA. They are state machines that run through a sequence of DMA transfers and provide the reader/writer interfaces that the rdwr_module uses to transfer data. In the next post we will look at the UART receiver and transmitter.

Building a virtio-serial FPGA device (Part 6): Writing the RISC-V firmware

This is the final post in a series about building a virtio-serial device in Verilog for an FPGA development board. This time we'll look at the firmware running on the PicoRV32 RISC-V soft-core in the FPGA.

Series table of contents

  1. Part 1: Overview
  2. Part 2 - MMIO registers, DMA, and interrupts
  3. Part 3 - virtio-serial device design
  4. Part 4 - Virtqueue processing
  5. Part 5 - UART receiver and transmitter
  6. Part 6 - Writing the RISC-V firmware (you are here)

The code is available at https://gitlab.com/stefanha/virtio-serial-fpga.

The PicoRV32 RISC-V soft-core boots up executing code from flash memory at 0x10000000. Since RISC-V is supported by LLVM and gcc, it is possible to write the firmware in several languages. For this project I wanted to use Rust and was aware of several existing crates that already provide APIs for things that would be needed.

I used a Rust no_std environment, which means that the standard library (std) is not available and only the core library (core) is available. Crates written for embedded systems and low-level programming often support no_std, but most other crates rely on the standard library and an operating system. no_std is a niche in the Rust ecosystem but it works pretty well.

The following crates came in handy:

  • riscv-rt provides the basic startup code for bare metal on RISC-V. It has the linker script, assembly pre-Rust startup code, and provides things that Rust's runtime needs.
  • safe-mmio is an API for MMIO device register access. This was helpful for low-level testing of device registers during the early phases of the project.
  • virtio-drivers has a virtio-serial driver! I didn't need to implement virtqueues, the VIRTIO MMIO Transport, or the virtio-serial driver software myself.

Initially I thought I could get away without a memory allocator since no_std does not have one by default and it would be extra work to set one up. However, virtio-drivers needed one for the virtio-serial device (I don't think it is really necessary, but the code is written that way). Luckily the embedded-alloc has memory allocators that are easy to set up and just need a piece of memory to operate in.

Aside from the setup code, the firmware is trivial. The CPU just sends a hello world message and then echoes back bytes received from the virtio-serial device.

#[riscv_rt::entry]
fn main() -> ! {
    unsafe {
        extern "C" {
            static _heap_size: u8;
        }
        let heap_bottom = riscv_rt::heap_start() as usize;
        let heap_size = &_heap_size as *const u8 as usize;
        HEAP.init(heap_bottom, heap_size);
    }

    // Point virtio-drivers at the MMIO device registers
    let header = NonNull::new(0x04000000u32 as *mut VirtIOHeader).unwrap();
    let transport = unsafe { MmioTransport::new(header, 0x1000) }.unwrap();

    // Put the string on the stack so the device can DMA (it cannot DMA flash memory)
    let mut buf: [u8; 13] = *b"Hello world\r\n";

    if transport.device_type() == DeviceType::Console {
        let mut console = VirtIOConsole::::new(transport).unwrap();
        console.send_bytes(&buf).unwrap();
        loop {
            if let Ok(Some(ch)) = console.recv(true) {
                buf[0] = ch;
                console.send_bytes(&buf[0..1]).unwrap();
            }
        }
    }
    loop {}
}

In the early phases I ran tests on the iCESugar board that lit up an LED to indicate the test result. As things became more complex I switched over to Verilog simulation. I wrote testbenches that exercise the Verilog modules I had written. This is similar to unit testing software.

In the later stages of the project, I changed the approach once more in order to do integration testing and debugging. To get more visibility into what was happening in the full design with a CPU and virtio-serial device, I used GTKWave to view the VCD files that Icarus Verilog can write during simulation. You can see every cycle and every value in each register or wire in the entire design, including the PicoRV32 RISC-V CPU, virtio-serial device, etc.

This allowed very powerful debugging since the CPU activity is visible (see the program counter in the reg_pc register in the screenshot) alongside the virtio-serial device's internal state. It is possible to look up the program counter in the firmware disassembly to follow the program flow and see where things went wrong.

Conclusion

The firmware is a small Rust codebase that uses existing crates, including riscv-rt and virtio-drivers. Throughout the project I used several debugging and simulation approaches, depending on the level of complexity. Thanks to the open source code and tools available, it was possible to complete this project using fairly convenient and powerful tools and without spending a lot of time reinventing the wheel. Or at least without reinventing the wheels I didn't want to reinvent :).

Let me know if you enjoy FPGAs and projects you've done!

Building a virtio-serial FPGA device (Part 5): UART receiver and transmitter

This is the fifth post in a series about building a virtio-serial device in Verilog for an FPGA development board. This time we'll look at the UART receiver and transmitter.

Series table of contents

  1. Part 1: Overview
  2. Part 2 - MMIO registers, DMA, and interrupts
  3. Part 3 - virtio-serial device design
  4. Part 4 - Virtqueue processing
  5. Part 5 - UART receiver and transmitter (you are here)
  6. Part 6 - Writing the RISC-V firmware

The code is available at https://gitlab.com/stefanha/virtio-serial-fpga.

How UARTs work

A Universal Asynchronous Receiver-Transmitter (UART) is a simple interface for data transfer that only requires a transmitter (tx) and a receiver (rx) wire. There is no clock wire because both sides of the connection use their own clocks and sample the signal in order to reconstruct the bits being transferred. This agreed-upon data transfer rate (or baud rate) is usually modest and the frame encoding is also not the most efficient way of transferring data, but UARTs get the job done and are commonly used for debug consoles, modems, and other relatively low data rate interfaces.

There is a framing protocol that makes it easier to reconstruct the transferred data. This is important because failure to correctly reconstruct the data results in corrupted data being received on the other side. In this project I used a 9,600 bit/s baud rate and 8 data bits, no parity bit, and 1 stop bit (sometimes written as 8N1). The framing works as follows:

  • When no data is being transferred, the signal is 1.
  • Before the data byte, a start bit is sent with the value 0. This way a receiver can detect the beginning of a frame.
  • The start bit is followed by the 8 data bits in least significant bit order.
  • After the data bits the frame ends with a stop bit with the value 1.

The job of the transmitter is to follow this framing protocol. The job of the receiver is to detect the next frame and to reconstruct the byte being transferred.

Implementation

The uart_reader and uart_writer modules implement the UART receiver and transmitter, respectively. They are designed around the rdwr_stream module's reader and writer interfaces. That means uart_reader receives the next byte from the UART rx pin whenever it is asked to read more data and uart_writer transmits on the UART tx pin whenever it is asked to write more data.

uart_reader follows a trick I learnt from the PicoSoC's simpleuart module: once the rx pin goes from 1 to 0, it waits until half the period (e.g. 9,600 baud @ 12 MHz / 2 = 625 clock cycles) has passed before sampling the rx pin. This works well because the UART only transfers data on the iCESugar PCB and is not exposed to much noise. Fancier approaches involve sampling the pin every clock cycle in order to try to reconstruct the value more accurately, but they don't seem to be necessary for this project.

Here is the core uart_reader code, a state machine that parses the incoming frame:

always @(posedge clk) begin
    ...
    div_counter <= div_counter + 1;
    case (bit_counter)
    0: begin // looking for the start bit
        if (rx == `START_BIT) begin
            div_counter <= 0;
            bit_counter <= 1;
        end
    end
    1: begin
        /* Sample in the middle of the period */
        if (div_counter == clk_div >> 1) begin
            div_counter <= 0;
            bit_counter <= 2;
        end
    end
    10: begin // expecting the stop bit
        if (div_counter == clk_div) begin
            if (rx == `STOP_BIT && !reg_ready) begin
                data <= {24'h0, rx_buf};
                data_len <= 1;
                reg_ready <= 1;
            end
            bit_counter <= 0;
        end
    end
    default: begin // receive the next data bit
        if (div_counter == clk_div) begin
            rx_buf <= {rx, rx_buf[7:1]};
            div_counter <= 0;
            bit_counter <= bit_counter + 1;
        end
    end
    endcase

The uart_writer module is similar, but it has a transmit buffer that it sends over the UART tx pin with the framing that I've described here.

Conclusion

The uart_reader and uart_writer modules are responsible for receiving and transmitting data over the UART rx/tx pins. They implement the framing protocol that UARTs use to protect data. In the next post we will cover the firmware running on the PicoRV32 RISC-V soft-core that drives the I/O.

Building a virtio-serial FPGA device (Part 3): virtio-serial device design

This is the third post in a series about building a virtio-serial device in Verilog for an FPGA development board. This time we'll look at the design of the virtio-serial device and how to decompose it into modules.

Series table of contents

  1. Part 1: Overview
  2. Part 2 - MMIO registers, DMA, and interrupts
  3. Part 3 - virtio-serial device design (you are here)
  4. Part 4 - Virtqueue processing
  5. Part 5 - UART receiver and transmitter
  6. Part 6 - Writing the RISC-V firmware

The code is available at https://gitlab.com/stefanha/virtio-serial-fpga.

A virtio-serial device is a serial controller, enabling communication with the outside world. The iCESugar FPGA development board has UART rx and tx pins connecting the FPGA to a separate microcontroller that acts as a bridge for USB serial communication. That means the FPGA can wiggle the bits on the UART tx pin to send bytes to a computer connected to the board via USB and you can receive bits from the computer through the UART rx pin. The purpose of the virtio-serial device is to present a VIRTIO device to the PicoRV32 RISC-V CPU inside the FPGA so the software on the CPU can send and receive data.

Device design

The virtio-serial device implements the Console device type defined in the VIRTIO specification and exposes it to the driver running on the CPU via the VIRTIO MMIO Transport. The terms "serial" and "console" are used interchangeably in the VIRTIO community and I will usually use serial unless I'm specifically talking about the Console device type section in the VIRTIO specification.

VIRTIO separates the concept of a device type (like net, block, or console) from the transport that allows the driver to access the device. This architecture allows VIRTIO to be used across a range of different machines, including machines that have a PCI bus, MMIO devices, and so on. Fortunately the VIRTIO MMIO transport is fairly easy to implement from scratch.

The virtio_serial_mmio module implements the virtio-serial device from the following parts:

  • VIRTIO MMIO Transport - MMIO device registers conforming to the VIRTIO specification. They allow the CPU to configure the device and initiate data transfers.
  • UART reader & virtqueue writer - Incoming data from the UART rx pin is enqueued on the VIRTIO Console receiveq (virtqueue 0) where the driver can receive it.
  • Virtqueue reader & UART writer - The VIRTIO Console transmitq (virtqueue 1) lets the driver enqueue data that the device sends over the UART tx pin.

The virtio-serial device interfaces with the outside world through an MMIO interface that the CPU uses to access the device registers, a DMA interface for initiating RAM memory transfers, and the UART rx/tx pins for actually sending and receive data.

Note that both the virtqueue_reader and the virtqueue_writer modules require DMA access, so I reused the spram_mux module that multiplexes the CPU and the virtio-serial device's RAM accesses. spram_mux is used inside virtio_serial_mmio to multiplex access to the single DMA interface.

Reader and writer interfaces

Since the job of the device is to transfer data between the virtqueues and the UART rx/tx pins, it is organized around a module named rdwr_stream that constantly attempts to read data from a source and write it to a destination:

/* Stream data from a reader to a writer */
module rdwr_stream (
    input clk,
    input resetn,

    /* The reader interface */
    output reg rd_phase = 0,
    input [31:0] rd_data,
    input [2:0] rd_data_len,
    input rd_ready,

    /* The writer interface */
    output reg wr_phase = 0,
    output reg [31:0] wr_data = 0,
    output reg [2:0] wr_data_len = 0,
    input wr_ready
);

By implementing the reader and writer interfaces for the virtqueues and UART rx/tx pins, it becomes possible to pump data between them using rdwr_stream. For testing it's also possible to configure virtqueue loopback or UART loopback so that the virtqueue logic or the UART logic can be exercised in isolation.

The reader and writer interfaces that the rdwr_stream module uses are the central abstraction in the virtio-serial device. You might notice that this interface uses a phase bit rather than a valid bit like in the valid/ready interface for MMIO and DMA. Every transfer is initiated by flipping the phase bit from its previous value. I find the phase bit approach easier to work with because it distinguishes back-to-back transfers, whereas interfaces that allow the valid bit to stay 1 for back-to-back transfers are harder to debug. It would be possible to switch to a valid/ready interface though.

To summarize, there are 4 reader or writer implementations that can be connected freely through the rdwr_stream module:

  1. virtqueue_reader - reads buffers from the transmitq virtqueue (virtqueue 1).
  2. virtqueue_writer - writes buffers to the receiveq virtqueue (virtqueue 0).
  3. uart_reader - reads data from the UART rx pin.
  4. uart_writer - writes data to the UART tx pin.

Conclusion

The virtio-serial device consists of the VIRTIO MMIO Transport device registers plus two rdwr_streams that transfer data between virtqueues and the UART. The next post will look at how virtqueue processing works.

Building a virtio-serial FPGA device (Part 2): MMIO registers, DMA, and interrupts

This is the second post in a series about building a virtio-serial device in Verilog for an FPGA development board. This time we'll look at integrating MMIO devices to the PicoSoC (an open source System-on-Chip using the PicoRV32 RISC-V soft-core).

Series table of contents

  1. Part 1: Overview
  2. Part 2 - MMIO registers, DMA, and interrupts (you are here)
  3. Part 3 - virtio-serial device design
  4. Part 4 - Virtqueue processing
  5. Part 5 - UART receiver and transmitter
  6. Part 6 - Writing the RISC-V firmware

There are three common ways in which devices interact with a system:

  • Memory-mapped hardware registers let driver software running on the CPU communicate with the device. This is called MMIO.
  • Direct Memory Access (DMA) lets the device initiate RAM read or write accesses without tying up the CPU. This is typically used for bulk data transfers. An example is a network card receiving a packet into memory buffer.
  • Interrupts allow the device to signal the CPU that can event has occurred.

PicoSoC supports MMIO device registers and interrupts out of the box. It does not support DMA, but I will explain how this can be added by modifying the code later.

Memory-mapped I/O registers

First let's look at implementing MMIO registers for a device in PicoSoC. The PicoRV32 CPU's memory interface looks like this:

output        mem_valid // request a memory transfer
output        mem_instr // hint that CPU is fetching an instruction
input         mem_ready // reply to memory transfer

output [31:0] mem_addr  // address
output [31:0] mem_wdata // data being written
output [ 3:0] mem_wstrb // 0000 - read
                        // 0001 - write 1 byte
                        // 0011 - write 2 bytes
                        // 0111 - write 3 bytes
                        // 1111 - write 4 bytes
input  [31:0] mem_rdata // data being read

When mem_valid is 1 the CPU is requesting a memory transfer. The memory address in mem_addr is decoded and the appropriate device is selected according to the memory map (e.g. virtio-serial device at 0x04000000-0x040000ff). The selected device then handles the memory transfer and asserts mem_ready to let the CPU know that the transfer has completed.

In order to handle MMIO device register accesses, the virtio-serial device needs a similar memory interface. The register logic is implemented in a case statement that handles wdata or rdata depending on the semantics of the register. Here is the VIRTIO MMIO MagicValue register implementation that reads a constant identifying this as a VIRTIO MMIO device:

module virtio_serial_mmio (
    ...
    input iomem_valid,
    output iomem_ready,
    input [3:0] iomem_wstrb,
    input [7:0] iomem_addr,
    input [31:0] iomem_wdata,
    output [31:0] iomem_rdata,
    ...
);
    ...
    always @(posedge clk) begin
        ...
        case (iomem_addr)
            `REG_MAGIC_VALUE: begin
                // Note that ready and rdata are basically iomem_ready
                // and iomem_rdata but there is some more glue behind
                // this.
                ready <= 1;
                rdata <= `MAGIC_VALUE;
            end

Direct Memory Access

MMIO registers are appropriate when the CPU needs to initiate some activity in the device, but it ties up the CPU during the load/store instructions that are accessing the device registers. For bulk data transfer it is common to use DMA instead where a device initiates RAM data transfers itself without CPU involvement. This allows the CPU to continue running independently of device activity.

VIRTIO is built around DMA because the virtqueues live in RAM and the device initiates accesses to both the virtqueue data structures as well as the actual data buffers containing the I/O payload.

The iCESugar board has Single Port RAM (SPRAM), which means that it can only be accessed through one interface and that is already connected to the CPU. In order to allow the virtio-serial device to access RAM, it is necessary to multiplex the SPRAM interface between the CPU and the virtio-serial device. I chose to implement a fixed-priority arbiter to do this because fancier a round-robin strategy is not necessary for this project. The virtio-serial device will only access RAM in short bursts, so the CPU will not be starved.

You can look at the spram_mux module to see the implementation, but it basically has 2 input memory interfaces and 1 output memory interface. One input interface is high priority and the other is low priority. The virtio-serial device uses the high priority port and the CPU uses the low priority port.

The virtio-serial device is designed for DMA via a state machine that keeps track of the current memory access that is being performed. When the device sees the ready input asserted, it knows the DMA transfer has completed and it transitions to the next state (often multiple memory accesses are performed in sequence to load the virtqueue data structures).

For example, here are state machine transitions for loading the first two fields of the virtqueue descriptor:

always @(posedge clk) begin
    ...
    if (ram_valid && ram_ready) begin
        ...
        case (state)
        ...
        `STATE_READ_DESCRIPTOR_ADDR_LOW: begin
            desc_addr_low <= ram_rdata;
            ram_addr <= ram_addr + 2;
            state <= `STATE_READ_DESCRIPTOR_LEN;
        end
        `STATE_READ_DESCRIPTOR_LEN: begin
            desc_len <= ram_rdata;
            ram_addr <= ram_addr + 1;
            state <= `STATE_READ_DESCRIPTOR_FLAGS_NEXT;
        end

When the DMA transfer completes in the STATE_READ_DESCRIPTOR_ADDR_LOW state, the virtqueue descriptor's buffer address (low 32 bits) are stored into the desc_addr_low register for later use and ram_addr is updated to the memory address of the virtqueue descriptor's length field. The STATE_READ_DESCRIPTOR_LEN state has similar logic.

In other words, DMA transfers require splitting up the device implementation into a state machine that handles DMA completion in a future clock cycle. In the software world this is similar to callbacks in event loops where code is split up because we need to wait for a completion.

Interrupts

The PicoRV32 soft-core has basic interrupt support, but it does not implement the standard RISC-V Control and Status Registers (CSRs) for interrupt handling. Supporting this would require extra work on the firmware side because the existing riscv-rt Rust crate doesn't implement the PicoRV32 interrupt mechanism. Also, I ended up running low on logic cells in the FPGA, so I disabled the PicoRV32's optional interrupt support to save space. Luckily VIRTIO devices support busy waiting, so interrupts are not required.

Conclusion

This post described how the virtio-serial device is connected to the PicoSoC and how MMIO registers and DMA work. MMIO register implementation was easy, but I spent quite a bit of time debugging waveforms with GTKWave to make sure that the memory interface and spram_mux was both working correctly and not wasting clock cycles. In the next post we'll look at the design of the virtio-serial device.

Building a virtio-serial FPGA device (Part 1): Overview

This is a the first post in a series about building a virtio-serial device in Verilog for a Field Programmable Gate Array (FPGA) development board. This was a project I did in my spare time to become familiar with logic design. I hope these blog posts will offer a glimpse into designing your own devices and FPGA development.

Series table of contents

  1. Part 1: Overview (you are here)
  2. Part 2 - MMIO registers, DMA, and interrupts
  3. Part 3 - virtio-serial device design
  4. Part 4 - Virtqueue processing
  5. Part 5 - UART receiver and transmitter
  6. Part 6 - Writing the RISC-V firmware

Having developed systems software including firmware, device drivers for Linux, and device emulation in QEMU, I wanted to implement a device from scratch on an FPGA, leaving the comfort of the software world and getting some experience with hardware internals. And it didn't take long before I got both the good and the bad experiences. For example, when a device has to process data structures that are not aligned in memory and what a pain that becomes! More on that later.

A few years ago, I ordered a development board with an iCE40UP5k FPGA with the intention of implementing a CPU and maybe a USB controller. I was busy with other things though and the FPGA ended up in a drawer until I recently felt the time was right to dive in.

The muselab iCESugar board that I used for this project costs around 50 USD. It does not support high-speed interfaces like PCIe or Ethernet, but it has 5280 logic cells, 128 KB RAM, 8 MB of flash memory, and a collection of basic I/O including onboard LEDs, UART pins, and PMOD headers. That puts it roughly on par with an Arduino microcontroller board, except you're not stuck with a particular microcontroller because you can design your own or use existing soft-cores, as they are called.

The board can be flashed via USB and loading the manufacturer's demos was an eye opener: it can run several different CPU soft-cores (RISC-V, 6502, etc) and there is even enough capacity to run MicroPython on a soft-core. Typing Python into the prompt and getting output back knowing that the CPU it is running on is just some Verilog code that you can read and modify is neat.

Out the available demo soft-cores, the PicoRV32 RISC-V soft-core interested me most because it's a 32-bit microcontroller with open source compiler toolchain support despite the Verilog implementation being tiny. You can write firmware for the PicoRV32 in Rust, C, etc.

A tiny soft-core is important because it leaves logic cells free for integrating custom devices. There is no point in a fancier soft-core if it complicates the project or limits the number of cells available for my own logic.

The PicoRV32 code comes with an example System-on-Chip (SoC) called PicoSoC that integrates RAM, flash, and UART serial port communication. Custom memory-mapped I/O (MMIO) devices can be wired into the SoC by adding address decoding logic and connecting the devices to the bus. PicoSoC is a great time-saver for developing a custom RV32 SoC because RAM and flash are critical but not particularly exciting to integrate yourself.

The PicoSoC exposes a trivial MMIO register interface for the UART, but I wanted to replace it with a virtio-serial device in order to learn about implementing a more advanced device. VIRTIO devices use Direct Memory Access (DMA) and interrupts, although I ended up not implementing interrupts due to running out of logic cells in the end. This provides an opportunity to implement a device from scratch that is small but not trivial.

While PicoSoC has no PCI bus for the popular VIRTIO PCI transport, it is possible to implement the VIRTIO MMIO transport for this SoC since that just involves selecting some address space for the device's registers where the PicoRV32 CPU can access the device.

Having covered all this, the goal of this project is to write a virtio-serial device in Verilog and integrate it into PicoSoC. This also requires writing firmware that runs on the PicoRV32 soft-core to prove that the virtio-serial device works. In the posts that follow, I'll describe the main stops on the journey to building this.

The next post will cover MMIO registers, DMA, and interrupts.

You can also check out the code for this project at https://gitlab.com/stefanha/virtio-serial-fpga.