Designing Memory-mapped Peripheral IPs in RTL

Significance

One of the characteristics of a well-designed IP is having a standard IO interface. An IP core designed in bare RTL typically has a custom interface with control, status, and data ports. For ease of integration into a system or SoC, it is convenient to incorporate an industry standard bus interface at IO and convert the IP in bare RTL to a memory-mapped peripheral. The IP can then be integrated to a system bus or low-speed peripheral bus as a slave, and can be accessed using the addressing space by a bus master like processor. A few of the advantages of this being:

Makes the IP flexible, portable, and easy to integrate into a system/SoC. ‘Plug and play’ into any system which supports the bus interface with minimal/no modification.
Processor need not worry about communicating with IP at signal level (lowest level). From processor’s perspective, the peripheral is just like a ‘memory’ which it can either to write to or read from.
Firmware friendly; drivers/firmware to access the IP need not be aware of the protocols/handshaking at signal level. Instead the complexity is broken down to: Which register to be written? What value and when? Which register to be read and when?
Historically, lesser area and power at the cost of accessing speed, compared to IO mapped peripherals. But that’s fine as long as the IP accesses require low-bandwidth compared to the system speed.

How to convert an IP to a simple Memory-mapped Peripheral

So, today let’s try to absorb the idea presented above and understand how to convert a simple IP in RTL to a memory-mapped peripheral.

Today’s agenda in the blog is as follows:

Design a simple GCD Calculator (gcd_ip) in RTL to compute GCD of two 8-bit numbers and output the result. Custom interface with Valid-Ready handshaking shall be used.
Define a standard IO interface to be used as wrapper on top of the GCD Calculator core. Some of the standard and popular interfaces are lightweight AMBA interfaces like APB, AHB-Lite, AXI4-Lite, Avalon. Let’s go ahead with the simplest one: APB Slave interface. Refer to APB AMBA specifications.
Define Register space / Addressing space i.e., set of registers to control/access IP (giving input, reading output….).
Design logic to read and write to registers.
Design glue logic to decode and map register accesses to the custom interface/protocol defined by the IP.

GCD Calculator

We won’t be going inside the design of this simple module, but rather see it as a black box whose functionality and IO interface are known. In case, you are curious, the core uses the good old Euclidean algorithm to compute GCD of two numbers. The IP supports 8-bit numbers and Valid-Ready handshaking at both input and output.

GCD Calculator IP with native interface

As you know, data always get sampled at input/output when both valid and ready are high. The inputs to GCD Calculator are a and b with i_valid, and the output is gcd with o_valid. The signal o_ready is used by the IP to flag that it is ready to accept inputs. Once the operation is finished, output is driven with o_valid. The signal i_ready is used to flag the IP that the output is ready to be read out. This is a non-pipelined IP with no internal buffering, so the output has to be read out once it becomes valid. Until then, the IP will stall its operation.

Design APB Slave wrapper for GCD Calculator IP

Let’s employ APB Slave interface to wrap the IP as we discussed earlier. APB is the most lightweight bus protocol in AMBA as it requires minimal no. of signals and simple handshaking. The idea of wrapping looks something like this:

APB wrapper around GCD Calculator

APB Slave interface is used to read and write to a set of registers (peripheral registers).

Mapping Logic is the glue logic between the registers and the native interface of GCD Calculator.

We have to decide APB address and data widths. If we assume that the IP is going to be integrated on a system with 32-bit byte-addressable addressing space, data width and peripheral registers’ size maybe fixed at 32. Let’s decide the address width later on.

Register Space

Before defining register space, we should specify the run-time configurability and accessibility we want to provide in the IP. Let’s define following requirements:

Feature to enable/disable the IP.
Inputs a, b should be configurable.
Output gcd should be readable.
Should flag if the IP is busy or not (ready to accept inputs?).
Should flag when the output becomes valid to be read.

After contemplating on the configurability/accessibility requirements and analyzing the native interface of GCD Calculator, let us define the set of registers for the IP; their address and functionality. Each register is of 32-bit, and assumes 0x0 on reset.

S No	Address	Register	Access type	Description
1	0x00	control	RW	To control the IP operation. [0] : Enable/Disable
2	0x04	status	RO	Status of the IP. [0] : Output valid or not [1] : Ready to accept inputs or not
3	0x08	data_in	RW	Inputs to the IP. [15 : 0] : {a, b}
4	0x0C	data_out	RO	Output of the IP. [7 : 0] : gcd

Register Space of GCD Calculator IP

That’s it! Four registers have been defined in the register space to control the access of IP. All unused bits in the registers may be reserved for future use. By the way, you can use your imagination to tweak this register space in different ways!

Addressing

Since we assumed that the addressing space is byte-addressable (most commonly used addressing scheme), each address can store a byte. Since the addressing space is of 32-bit, the data width is 32, and read/write accesses to the addressing space are typically of 32-bit i.e., four bytes at a time. Therefore, the addresses of 32-bit registers have to be in multiples of four. Such addresses are called 32-bit aligned addresses.

This explains why register addresses in the register space of GCD Calculator are: $\text{0x00}$ , $\text{0x04}$ , $\text{0x08}$ , $\text{0x0C}$ instead of $\text{0x01}$ , $\text{0x02}$ , $\text{0x03}$ , $\text{0x04}$ . These are called offset addresses to identify which register has to be accessed in a peripheral. In the system, the peripheral will be assigned a base address so that a bus master can uniquely identify the peripheral to be accessed, say $\text{0x30000000}$ . So in the system or SoC, our IP’s register space will span across: $\text{0x30000000}$ to $\text{0x3000000F}$ . The bus master like processor sends this full or absolute address (base + offset) to access a peripheral register. However, we IP designers, worry only about offset addresses, as this is what is received at the IP’s bus interface (APB in our case) after decoding by a bus bridge or interconnect in the system.

Accessing status register of our IP by a processor in an SoC looks like this for example:

Accessing gcd_ip in an SoC

The last register’s address in the register space of GCD Calculator is $\text{0x0C}$ . Therefore, the address width should ideally be four. However, internally, two MSbs are enough to decode the addresses as two LSbs are always zero in this addressing scheme (optimizes the address decoder!).

Read/Write Logic for Registers

Once the register space is defined, next step is to design read/write logic:

To read all RW and RO registers.
To write to all RW registers. Since RO registers are updated exclusively by the IP, these registers should be write-protected from external world. Any write access to these registers is hence deemed invalid in our design.

The logic should decode address and/or data from APB signals and do either read/write to the addressed register.

Mapping Logic

This is the last part of our design process and probably the trickiest part as well while writing a wrapper.

Mapping Logic as I call it, is responsible for decoding and mapping the register accesses to the native interface of the IP. Mapping Logic has to decode the data in registers, and drive the input ports of the IP as and when required. It also have to read different statuses of IP and update it in the corresponding register as and when necessary. Apart from above, Mapping Logic should take care of all signal-level handshaking, timings, and comply with all protocols defined by the IP.

So let’s see how to crack this in case of our IP! Break down each mapping logic by listing down the purpose of each register:

control register: Writing $\text{0x01}$ should enable the IP. The IP doesn’t have any dedicated enable port, so we can implement a work-around like keeping the IP in reset until katex] \text{0x01} [/katex] is written to the register. If reset is an active-low reset, the logic to be implemented is as simple as:

assign gcd_ip_resetn = global_resetn & control_register [0] ;

status register: The IP status signals should be updated in this register synchronously.

status_register [0] <= gcd_ip_o_valid ;
status_register [1] <= gcd_ip_o_ready ;

data_in register: Data written in this register should go to the IP as inputs a, b.

assign gcd_ip_a = data_in_register [15 : 8] ;
assign gcd_ip_b = data_in_register [7 : 0]  ;

data_out register: The output gcd from IP should be updated in this register synchronously.

data_out_register <= gcd_ip_gcd ;

That’s all the basic stuff mapping logic has to implement!

But………… Is it finished yet?

If you have followed the whole design process closely, there are two crucial signals which are missing and yet to be mapped at the IP interface. They are the gcd_ip_i_valid and gcd_ip_i_ready ports in the IP, which are the input valid and input ready signals. We have setup the complete data path mapping logic. What remaining is we have to implement Valid-Ready handshaking correctly to complete the design. Otherwise it is never assured that input/output data have been sampled correctly. That opens doors to the probability of missing, duplication, or corruption of data.

There are two points to be kept in mind before we can think on how to implement the handshaking:

Inputs a and b may be written in data_in register. But this is not enough for the IP to start operation. It has to be flagged as valid data by asserting gcd_ip_i_valid. Following handshaking is required with gcd_ip_o_ready at the IP’s native interface:

Valid-Ready Handshaking at input

Output may be read out from data_out register if gcd_ip_o_valid bit is set in status register. But this is not enough to resume the operation of the IP (which is stalling now). Reading the output should be flagged to the IP by asserting gcd_ip_i_ready. Following handshaking is required with gcd_ip_o_valid at the IP’s native interface:

Valid-Ready Handshaking at output

So, does that mean we need separate control bits in register space to drive these valid and ready signals? No! We cannot implement cycle-level accurate handshaking by manipulating these bits in every clock cycle through APB interface. Besides, that would increase the complexity of controlling the IP from the perspective of Bus Master/Firmware.

Instead, let us exploit the designer’s freedom and enforce a simple protocol or set of “rules” at register access level which is both RTL friendly and Bus Master/Firmware friendly. These “rules” are direct implications of the two observations we made earlier.

For data to be correctly sampled at the input of IP:

Input data should be written to data_in register only if gcd_ip_o_ready bit is set in status register, i.e., the IP is ready to accept inputs. Complying to this rule is the responsibility of Bus Master/Firmware.
Whenever data is written to data_in register, gcd_ip_i_valid should be internally generated and pulsed for one cycle. Implementation of this logic is the responsibility of Mapping Logic.

For data to be correctly sampled at the output of IP:

Output data should be read out from data_out register only if gcd_ip_o_valid is set status register. i.e., valid output is available from the IP. Complying to this rule is the responsibility of Bus Master/Firmware.
Whenever data is read from data_out register, gcd_ip_i_ready should be internally generated and pulsed for one cycle. Implementation of this logic is the responsibility of Mapping Logic.

Incorporate the above extra logic to Mapping Logic.

Finally…

That finishes of our whole design process! We have successfully converted the GCD Calculator IP to a memory-mapped APB peripheral.

You can think of different implementations from what is given in the blog, but the idea/concept remains same. In fact, you can implement the same concept for a different wrapper like AXI4-Lite or AHB-Lite.

What’s next?

Consider the case in which a processor is communicating with our IP (peripheral) in the system. Processor writes input data to the IP. Now, you have to continuously poll on status register to check whether the valid output is available in data_out register yet. This is wastage of cycles and bandwidth for a bus master like processor which has to simply poll the peripheral register every time instead of executing other potential tasks. Instead, what if the peripheral could send an interrupt to processor when the output is ready? In this way, the processor need not poll anymore. After writing input data to the peripheral, it can continue with other tasks, and have to read the output only on receiving the interrupt (In firmware, this is the first step done in ISR).

Let us see how to add interrupts to the IP which we just designed.

Adding Interrupt to the IP

Typically, a memory-mapped peripheral IP with interrupt has interrupt enable, interrupt status (asserted state or de-asserted state?). We will reserve a bit for interrupt enable in control register. Let’s make it more interesting by reserving a bit for interrupt type as well; both Level-triggered or Edge-triggered interrupt will be supported by the IP. The modified control register looks like:

S No	Address	Register	Access type	Description
1	0x00	control	RW	To control the IP operation. [0] : Enable/Disable [1] : Interrupt Enable/Disable [2] : Interrupt Type (Level/Edge)

Edge-triggered interrupts are just a single-cycle pulse, while Level-triggered interrupt remains asserted until it is acknowledged. Regardless of the type, the interrupt has to be acknowledged to continue the operation of the IP from stalled state.

Interrupt if enabled, has to be asserted when the output is valid in data_out register. Or in other words, interrupt status is the same as gcd_ip_o_valid in status register, hence no need to provide a separate bit for interrupt status. It also means that implementing Level-triggered interrupt is as simple as:

assign intr_lvl = gcd_ip_o_valid ;    // or status_register [0] if one cycle delay is fine

To generate Edge-triggered interrupt, simply register gcd_ip_o_valid synchronously to the clock, and then add the rising-edge detection logic:

assign intr_edge = gcd_ip_o_valid & ~gcd_ip_o_valid_reg ;

Finally, add the logic to drive the interrupt to output:

assign intr_en   = control_register [1] ;
assign intr_type = control_register [2] ;
assign o_intr    = (intr_en)? ((intr_type)? intr_edge : intr_lvl) : 1'b0 ;

As simple as that!

So how is the interrupt de-asserted? How is interrupt acknowledgement happening here? And which signal is acting as acknowledgement?

Reading the data_out register is deemed as interrupt acknowledgement. Since this read operation internally generates gcd_ip_i_ready pulse which pulls down the gcd_ip_o_valid, the interrupt gets de-asserted. The IP is now ready to accept new data input and continue its operation.

Source Codes

Test bench has been modeled to demonstrate how driver APIs are developed for the IP for firmware access. For e.g: To send a data to the IP in firmware, the API would look something like this in C:

void gcd_write (uint8_t a, unint8_t b) {

   int sts = 0 ; 
   int ab  = (a << 8) | b ;

   while (!sts) sts = read_x32 (GCD_STS_REG) ;  // Read status register until ready
   write_x32 (GCD_DATAIN_REG, ab) ;             // Write to data_in register {a, b}

}

All source codes for this project are available for free in my github here.

Support

Leave a comment or visit support for any queries/feedback regarding the content of this blog.
If you liked Chipmunk , do follow here:

Follow Chipmunk