This is the second post in a series about building a virtio-serial device in Verilog for an FPGA development board. This time we'll look at integrating MMIO devices to the PicoSoC (an open source System-on-Chip using the PicoRV32 RISC-V soft-core).
Series table of contents
- Part 1: Overview
- Part 2 - MMIO registers, DMA, and interrupts (you are here)
- Part 3 - virtio-serial device design
- Part 4 - Virtqueue processing
- Part 5 - UART receiver and transmitter
- Part 6 - Writing the RISC-V firmware
There are three common ways in which devices interact with a system:
- Memory-mapped hardware registers let driver software running on the CPU communicate with the device. This is called MMIO.
- Direct Memory Access (DMA) lets the device initiate RAM read or write accesses without tying up the CPU. This is typically used for bulk data transfers. An example is a network card receiving a packet into memory buffer.
- Interrupts allow the device to signal the CPU that can event has occurred.
PicoSoC supports MMIO device registers and interrupts out of the box. It does not support DMA, but I will explain how this can be added by modifying the code later.
Memory-mapped I/O registers
First let's look at implementing MMIO registers for a device in PicoSoC. The PicoRV32 CPU's memory interface looks like this:
output mem_valid // request a memory transfer
output mem_instr // hint that CPU is fetching an instruction
input mem_ready // reply to memory transfer
output [31:0] mem_addr // address
output [31:0] mem_wdata // data being written
output [ 3:0] mem_wstrb // 0000 - read
// 0001 - write 1 byte
// 0011 - write 2 bytes
// 0111 - write 3 bytes
// 1111 - write 4 bytes
input [31:0] mem_rdata // data being read
When mem_valid is 1 the CPU is requesting a memory transfer. The memory address in mem_addr is decoded and the appropriate device is selected according to the memory map (e.g. virtio-serial device at 0x04000000-0x040000ff). The selected device then handles the memory transfer and asserts mem_ready to let the CPU know that the transfer has completed.
In order to handle MMIO device register accesses, the virtio-serial device needs a similar memory interface. The register logic is implemented in a case statement that handles wdata or rdata depending on the semantics of the register. Here is the VIRTIO MMIO MagicValue register implementation that reads a constant identifying this as a VIRTIO MMIO device:
module virtio_serial_mmio (
...
input iomem_valid,
output iomem_ready,
input [3:0] iomem_wstrb,
input [7:0] iomem_addr,
input [31:0] iomem_wdata,
output [31:0] iomem_rdata,
...
);
...
always @(posedge clk) begin
...
case (iomem_addr)
`REG_MAGIC_VALUE: begin
// Note that ready and rdata are basically iomem_ready
// and iomem_rdata but there is some more glue behind
// this.
ready <= 1;
rdata <= `MAGIC_VALUE;
end
Direct Memory Access
MMIO registers are appropriate when the CPU needs to initiate some activity in the device, but it ties up the CPU during the load/store instructions that are accessing the device registers. For bulk data transfer it is common to use DMA instead where a device initiates RAM data transfers itself without CPU involvement. This allows the CPU to continue running independently of device activity.
VIRTIO is built around DMA because the virtqueues live in RAM and the device initiates accesses to both the virtqueue data structures as well as the actual data buffers containing the I/O payload.
The iCESugar board has Single Port RAM (SPRAM), which means that it can only be accessed through one interface and that is already connected to the CPU. In order to allow the virtio-serial device to access RAM, it is necessary to multiplex the SPRAM interface between the CPU and the virtio-serial device. I chose to implement a fixed-priority arbiter to do this because fancier a round-robin strategy is not necessary for this project. The virtio-serial device will only access RAM in short bursts, so the CPU will not be starved.
You can look at the spram_mux module to see the implementation, but it basically has 2 input memory interfaces and 1 output memory interface. One input interface is high priority and the other is low priority. The virtio-serial device uses the high priority port and the CPU uses the low priority port.
The virtio-serial device is designed for DMA via a state machine that keeps track of the current memory access that is being performed. When the device sees the ready input asserted, it knows the DMA transfer has completed and it transitions to the next state (often multiple memory accesses are performed in sequence to load the virtqueue data structures).
For example, here are state machine transitions for loading the first two fields of the virtqueue descriptor:
always @(posedge clk) begin
...
if (ram_valid && ram_ready) begin
...
case (state)
...
`STATE_READ_DESCRIPTOR_ADDR_LOW: begin
desc_addr_low <= ram_rdata;
ram_addr <= ram_addr + 2;
state <= `STATE_READ_DESCRIPTOR_LEN;
end
`STATE_READ_DESCRIPTOR_LEN: begin
desc_len <= ram_rdata;
ram_addr <= ram_addr + 1;
state <= `STATE_READ_DESCRIPTOR_FLAGS_NEXT;
end
When the DMA transfer completes in the STATE_READ_DESCRIPTOR_ADDR_LOW state, the virtqueue descriptor's buffer address (low 32 bits) are stored into the desc_addr_low register for later use and ram_addr is updated to the memory address of the virtqueue descriptor's length field. The STATE_READ_DESCRIPTOR_LEN state has similar logic.
In other words, DMA transfers require splitting up the device implementation into a state machine that handles DMA completion in a future clock cycle. In the software world this is similar to callbacks in event loops where code is split up because we need to wait for a completion.
Interrupts
The PicoRV32 soft-core has basic interrupt support, but it does not implement the standard RISC-V Control and Status Registers (CSRs) for interrupt handling. Supporting this would require extra work on the firmware side because the existing riscv-rt Rust crate doesn't implement the PicoRV32 interrupt mechanism. Also, I ended up running low on logic cells in the FPGA, so I disabled the PicoRV32's optional interrupt support to save space. Luckily VIRTIO devices support busy waiting, so interrupts are not required.
Conclusion
This post described how the virtio-serial device is connected to the PicoSoC and how MMIO registers and DMA work. MMIO register implementation was easy, but I spent quite a bit of time debugging waveforms with GTKWave to make sure that the memory interface and spram_mux was both working correctly and not wasting clock cycles. In the next post we'll look at the design of the virtio-serial device.