Designing RISC-V CPU from scratch – Part 5: Decode Unit

Recap!

I hope, everyone of you have gone through the previous part of the RISC-V CPU Development blog series, where we talked about the design of Fetch Unit (FU) of Pequeno. If not, please go through it before moving ahead.

In this blog, we will design Decode Unit (DU) of Pequeno.

Decode Unit

Decode Unit (DU) is the Stage-2 of the CPU pipeline which decodes the instructions from Fetch Unit (FU), and send them to Execution Unit (EXU). It is also responsible for decoding the register addresses and sending them to Register File for register read operation.

Interfaces

Let’s define the interfaces for Decode Unit.

FU Interface	To receive instruction, control/data from Fetch Unit
Register File Interface	To access the source registers (rs0, rs1) for register read operation
EXU Interface	To send the decoded instruction, control/data to Execution Unit
Flush Interface	To flush DU externally

Table: Decode Unit – Interfaces

Decode Unit – Interfaces

FU Interface

This is the primary interface between Fetch Unit and Decode Unit to receive the payload. The payload includes the fetched instruction and branch prediction information. This interface was already discussed in the previous part.

EXU Interface

This is the primary interface between Decode Unit and Execution Unit to send the payload. The payload includes the decoded instruction, branch prediction information, and decode data.

EXU Interface to send payload

Following are the instruction and branch prediction signals that constitute the EXU I/F:

instruction packet	{instruction, PC} to EXU
branch_taken	Branch prediction signal to EXU; simply piped forward: FU->DU->EXU
bubble	Inverted version of valid to EXU
stall	Inverted version of ready from EXU

Instruction packet and branch prediction signals to EXU

Decode data are vital information decoded by DU from the fetched instruction and sent to EXU. Let’s gather what information would be required by EXU for the execution of an instruction.

Opcode, funct3, funct7: to identify the operation to be performed by EXU on the operands.
Operands: depending on the opcode, the operands can be register data (rs0, rs1), register address for writeback (rdt), or 12-bit/20-bit immediate values.
Instruction type: to identify which operands/immediate values have to processed.

The decoding can be tricky. If you have correctly understood the ISA and the instruction structuring, patterns can be identified for different types of instructions. Identifying patterns helps to design the decoding logic in DU.

Following information are decoded and sent to EXU via EXU I/F.

opcode	Instruction opcode. opcode = instruction[6:0]
rs0, rs1, rdt	Source registers0/1, Destination register. rs0 = instruction[19:15] rs1 = instruction[24:20] rdt = instruction[11:7]
funct3/funct7	funct3 = instruction[14:12] funct7 = instruction[31:25]
is_<r/i/s/b/u/j>_type	Instruction type. 1) R-type –> (opcode == 0x33) 2) I-type –> (opcode == 0x67) or (opcode == 0x03) or (opcode == 0x13) 3) S-type –> (opcode == 0x23) 4) B-type –> (opcode == 0x63) 5) U-type –> (opcode == 0x37) or (opcode == 0x17) 6) J-type –> (opcode == 0x6F)
alu_opcode[3:0]	ALU opcode. Instructions which require the use of ALU are categorized as ALU instructions. They are: 1) R-type instructions 2) I-type instructions 3) U-type instructions LUI & AUIPC instructions require adding operation, hence considered as ALU instructions. R-type: alu_opcode = {funct3, funct7[5]} I-type : alu_opcode = {funct3, funct7[5]} // SLLI/SRLI/SRAI instructions = {funct3, 1’b0} U-type: alu_opcode = 4’b0000
<i/s/b/u/j>_type_imm	Immediate value. 1) I-type imm[11:0] = instruction[31:20] 2) S-type imm[11:0] = {instruction[31:25], instruction[11:7]} 3) B-type imm[11:0] = {instruction[31], instruction[7], instruction[30:25], instruction[11:8]} 4) U-type imm[19:0] = instruction[31:12] 5) J-type imm[19:0] = {instruction[31], instruction[19:12], instruction[20], instruction[30:21]}

Decode data to EXU

EXU will use this information to de-mux the data to appropriate execution sub-units and execute the instruction.

Refer to Part-1 to refresh the ISA and understand the reasoning behind the decoding logic used by Decode Unit.

Register File Interface

For R-type instructions, source registers rs1, rs2, have to be decoded and read. The data read from the registers are the operands. All the general purpose user registers are present in Register File outside DU. Register File Interface is used by DU to send rs0, rs1 addresses to Register File for register access. Along with the payload, the data read from the Register File should also be sent to EXU in the same clock cycle.

Decode Unit and Register File interaction with EXU

Register File requires one cycle to read a register. DU takes one cycle to register the payload to be sent to EXU. The source register addresses are hence decoded directly from FU instruction packet by combinatorial logic. This ensures that the timing of 1) Payload from DU to EXU and 2) Data from Register File to EXU are synchronized.

Stall Logic

Only EXU can externally stall the operation of DU. When EXU asserts stall, DU’s internal instruction pipeline should be stalled immediately, and it should also assert stall to FU as it cannot accept anymore packets from FU. Register File should be stalled together with DU for synchronized operation as both of them are at the same stage of the 5-stage pipeline of the CPU. Hence, DU feeds forward the external stall from EXU to Register File. There are no internal conditions in DU that generates local stall.

Flush Logic

Only EXU can externally flush FU. EXU initiates branch_flush in the CPU instruction pipeline with the address of the next instruction to be fetched after flushing the pipeline (branch_pc). DU has provided Flush I/F so that external flush can be accepted.

The internal pipeline is flushed by branch_flush. The branch_flush from EXU should immediately invalidate the DU instruction to EXU with 0 cycle delay. This is to avoid potential control hazard in EXU in the next clock cycle.

In the design of Fetch Unit, we didn't invalidate the FU instruction to DU with 0 cycle delay on receiving branch_flush. This is because the DU will also be in flush in the next clock cycle, hence no control hazard can happen in DU. So, it is not necessary to invalidate the FU instruction. The same idea applies to the instruction from IMEM to FU.

Decode Unit Instruction Buffering in the Pipeline

The above flow chart represents how the instruction packet and branch prediction data from FU are buffered in DU in the instruction pipeline. Only single stage of buffering is used in DU.

Architecture

Let’s integrate all the micro-architectures we designed so far to complete the architecture of Decode Unit.

Decode Unit – Architecture

That’s all folks! We have successfully designed the Decode Unit of Pequeno 🙂

GitHub Repo of Pequeno

Decode Unit has been added to the GitHub repo of Pequeno. Follow me in GitHub and add the repo to favorites!

Find the repo here: pequeno_riscv

What’s next?

We have so far completed: Fetch Unit (FU), Decode Unit (DU). In the upcoming part, we will be designing Register File of Pequeno.

Visit the complete blog series

This post is part of RISC-V CPU Development blog series

<< Previous part |~~~~ J U M P ~~ T O ~~~~| Next part >>

Support

Leave a comment or visit support for any queries/feedback regarding the content of this blog.
If you liked Chipmunk , don’t forget to follow!:

Follow me

Follow Chipmunk

Designing RISC-V CPU from scratch – Part 5: Decode Unit