I hope, everyone of you have gone through the previous part of the RISC-V CPU Development blog series, where we talked about the design of Fetch Unit (FU) of Pequeno. If not, please go through it before moving ahead.
In this blog, we will design Decode Unit (DU) of Pequeno.
Decode Unit
Decode Unit (DU) is the Stage-2 of the CPU pipeline which decodes the instructions from Fetch Unit (FU), and send them to Execution Unit (EXU). It is also responsible for decoding the register addresses and sending them to Register File for register read operation.
Interfaces
Let’s define the interfaces for Decode Unit.
FU Interface | To receive instruction, control/data from Fetch Unit |
Register File Interface | To access the source registers (rs0, rs1) for register read operation |
EXU Interface | To send the decoded instruction, control/data to Execution Unit |
Flush Interface | To flush DU externally |
Table: Decode Unit – Interfaces
Decode Unit – Interfaces
FU Interface
This is the primary interface between Fetch Unit and Decode Unit to receive the payload. The payload includes the fetched instruction and branch prediction information. This interface was already discussed in the previous part.
EXU Interface
This is the primary interface between Decode Unit and Execution Unit to send the payload. The payload includes the decoded instruction, branch prediction information, and decode data.
EXU Interface to send payload
Following are the instruction and branch prediction signals that constitute the EXU I/F:
instruction packet | {instruction, PC} to EXU |
branch_taken | Branch prediction signal to EXU; simply piped forward: FU->DU->EXU |
bubble | Inverted version of valid to EXU |
stall | Inverted version of ready from EXU |
Instruction packet and branch prediction signals to EXU
Decode data are vital information decoded by DU from the fetched instruction and sent to EXU. Let’s gather what information would be required by EXU for the execution of an instruction.
- Opcode, funct3, funct7: to identify the operation to be performed by EXU on the operands.
- Operands: depending on the opcode, the operands can be register data (rs0, rs1), register address for writeback (rdt), or 12-bit/20-bit immediate values.
- Instruction type: to identify which operands/immediate values have to processed.
The decoding can be tricky. If you have correctly understood the ISA and the instruction structuring, patterns can be identified for different types of instructions. Identifying patterns helps to design the decoding logic in DU.
Following information are decoded and sent to EXU via EXU I/F.
opcode | Instruction opcode. opcode = instruction[6:0] |
rs0, rs1, rdt | Source registers0/1, Destination register. rs0 = instruction[19:15] rs1 = instruction[24:20] rdt = instruction[11:7] |
funct3/funct7 | funct3 = instruction[14:12] funct7 = instruction[31:25] |
is_<r/i/s/b/u/j>_type | Instruction type. 1) R-type –> (opcode == 0x33) 2) I-type –> (opcode == 0x67) or (opcode == 0x03) or (opcode == 0x13) 3) S-type –> (opcode == 0x23) 4) B-type –> (opcode == 0x63) 5) U-type –> (opcode == 0x37) or (opcode == 0x17) 6) J-type –> (opcode == 0x6F) |
alu_opcode[3:0] | ALU opcode. Instructions which require the use of ALU are categorized as ALU instructions. They are: 1) R-type instructions 2) I-type instructions 3) U-type instructions LUI & AUIPC instructions require adding operation, hence considered as ALU instructions. R-type: alu_opcode = {funct3, funct7[5]} I-type : alu_opcode = {funct3, funct7[5]} // SLLI/SRLI/SRAI instructions = {funct3, 1’b0} U-type: alu_opcode = 4’b0000 |
<i/s/b/u/j>_type_imm | Immediate value. 1) I-type imm[11:0] = instruction[31:20] 2) S-type imm[11:0] = {instruction[31:25], instruction[11:7]} 3) B-type imm[11:0] = {instruction[31], instruction[7], instruction[30:25], instruction[11:8]} 4) U-type imm[19:0] = instruction[31:12] 5) J-type imm[19:0] = {instruction[31], instruction[19:12], instruction[20], instruction[30:21]} |
Decode data to EXU
EXU will use this information to de-mux the data to appropriate execution sub-units and execute the instruction.
Refer to Part-1 to refresh the ISA and understand the reasoning behind the decoding logic used by Decode Unit.
Register File Interface
For R-type instructions, source registers rs1, rs2, have to be decoded and read. The data read from the registers are the operands. All the general purpose user registers are present in Register File outside DU. Register File Interface is used by DU to send rs0, rs1 addresses to Register File for register access. Along with the payload, the data read from the Register File should also be sent to EXU in the same clock cycle.
Decode Unit and Register File interaction with EXU
Register File requires one cycle to read a register. DU takes one cycle to register the payload to be sent to EXU. The source register addresses are hence decoded directly from FU instruction packet by combinatorial logic. This ensures that the timing of 1) Payload from DU to EXU and 2) Data from Register File to EXU are synchronized.
Stall Logic
Only EXU can externally stall the operation of DU. When EXU asserts stall, DU’s internal instruction pipeline should be stalled immediately, and it should also assert stall to FU as it cannot accept anymore packets from FU. Register File should be stalled together with DU for synchronized operation as both of them are at the same stage of the 5-stage pipeline of the CPU. Hence, DU feeds forward the external stall from EXU to Register File. There are no internal conditions in DU that generates local stall.
Flush Logic
Only EXU can externally flush FU. EXU initiates branch_flush in the CPU instruction pipeline with the address of the next instruction to be fetched after flushing the pipeline (branch_pc). DU has provided Flush I/F so that external flush can be accepted.
The internal pipeline is flushed by branch_flush. The branch_flush from EXU should immediately invalidate the DU instruction to EXU with 0 cycle delay. This is to avoid potential control hazard in EXU in the next clock cycle.
In the design of Fetch Unit, we didn't invalidate the FU instruction to DU with 0 cycle delay on receiving branch_flush. This is because the DU will also be in flush in the next clock cycle, hence no control hazard can happen in DU. So, it is not necessary to invalidate the FU instruction. The same idea applies to the instruction from IMEM to FU.
The above flow chart represents how the instruction packet and branch prediction data from FU are buffered in DU in the instruction pipeline. Only single stage of buffering is used in DU.
Architecture
Let’s integrate all the micro-architectures we designed so far to complete the architecture of Decode Unit.
Decode Unit – Architecture
That’s all folks! We have successfully designed the Decode Unit of Pequeno 🙂
GitHub Repo of Pequeno
Decode Unit has been added to the GitHub repo of Pequeno. Follow me in GitHub and add the repo to favorites!
Find the repo here: pequeno_riscv
What’s next?
We have so far completed: Fetch Unit (FU), Decode Unit (DU). In the upcoming part, we will be designing Register File of Pequeno.
Visit the complete blog series
This post is part of RISC-V CPU Development blog series
<< Previous part |~~~~ J U M P ~~ T O ~~~~| Next part >>
Support
Leave a comment or visit support for any queries/feedback regarding the content of this blog.
If you liked Chipmunk , don’t forget to follow!:
Follow Chipmunk
Like this:
Like Loading...
Related