Rocket RISC-V processor - Adding custom instructions
Introduction
In the context of a research project related to RIMI [1], we need to add new instructions in the Rocket Chip decoder. @QDucasse already proposed an analysis of the structure of the Rocket Chip source code [1]. This blog post aims to give a bit more details about the instruction encoding for load/store instructions in particular.
Instructions specification
Each instruction is defined with a dictonary of parameters (rocket-chip/IDecode.scala at v1.6 · chipsalliance/rocket-chip · GitHub). Here is a summary for the RVI subset:
| legal | fp | rocc | branch | jal | jalr | rxs2 | rxs1 | scie | sel_alu2 | sel_alu1 | sel_imm | alu_dw | alu_fn | mem | mem_cmd | rfs1 | rfs2 | rfs3 | wfd | mul | div | wxd | csr | fence_i | fence | amo | dp | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| RVI | BNE | Y | N | N | Y | N | N | Y | Y | N | A2_RS2 | A1_RS1 | IMM_SB | DW_X | FN_SNE | N | M_X | N | N | N | N | N | N | N | CSR.N | N | N | N | N |
| BEQ | Y | N | N | Y | N | N | Y | Y | N | A2_RS2 | A1_RS1 | IMM_SB | DW_X | FN_SEQ | N | M_X | N | N | N | N | N | N | N | CSR.N | N | N | N | N | |
| BLT | Y | N | N | Y | N | N | Y | Y | N | A2_RS2 | A1_RS1 | IMM_SB | DW_X | FN_SLT | N | M_X | N | N | N | N | N | N | N | CSR.N | N | N | N | N | |
| BLTU | Y | N | N | Y | N | N | Y | Y | N | A2_RS2 | A1_RS1 | IMM_SB | DW_X | FN_SLTU | N | M_X | N | N | N | N | N | N | N | CSR.N | N | N | N | N | |
| BGE | Y | N | N | Y | N | N | Y | Y | N | A2_RS2 | A1_RS1 | IMM_SB | DW_X | FN_SGE | N | M_X | N | N | N | N | N | N | N | CSR.N | N | N | N | N | |
| BGEU | Y | N | N | Y | N | N | Y | Y | N | A2_RS2 | A1_RS1 | IMM_SB | DW_X | FN_SGEU | N | M_X | N | N | N | N | N | N | N | CSR.N | N | N | N | N | |
| JAL | Y | N | N | N | Y | N | N | N | N | A2_SIZE | A1_PC | IMM_UJ | DW_XPR | FN_ADD | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| JALR | Y | N | N | N | N | Y | N | Y | N | A2_IMM | A1_RS1 | IMM_I | DW_XPR | FN_ADD | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| AUIPC | Y | N | N | N | N | N | N | N | N | A2_IMM | A1_PC | IMM_U | DW_XPR | FN_ADD | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| LB | Y | N | N | N | N | N | N | Y | N | A2_IMM | A1_RS1 | IMM_I | DW_XPR | FN_ADD | Y | M_XRD | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| LH | Y | N | N | N | N | N | N | Y | N | A2_IMM | A1_RS1 | IMM_I | DW_XPR | FN_ADD | Y | M_XRD | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| LW | Y | N | N | N | N | N | N | Y | N | A2_IMM | A1_RS1 | IMM_I | DW_XPR | FN_ADD | Y | M_XRD | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| LBU | Y | N | N | N | N | N | N | Y | N | A2_IMM | A1_RS1 | IMM_I | DW_XPR | FN_ADD | Y | M_XRD | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| LHU | Y | N | N | N | N | N | N | Y | N | A2_IMM | A1_RS1 | IMM_I | DW_XPR | FN_ADD | Y | M_XRD | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| SB | Y | N | N | N | N | N | Y | Y | N | A2_IMM | A1_RS1 | IMM_S | DW_XPR | FN_ADD | Y | M_XWR | N | N | N | N | N | N | N | CSR.N | N | N | N | N | |
| SH | Y | N | N | N | N | N | Y | Y | N | A2_IMM | A1_RS1 | IMM_S | DW_XPR | FN_ADD | Y | M_XWR | N | N | N | N | N | N | N | CSR.N | N | N | N | N | |
| SW | Y | N | N | N | N | N | Y | Y | N | A2_IMM | A1_RS1 | IMM_S | DW_XPR | FN_ADD | Y | M_XWR | N | N | N | N | N | N | N | CSR.N | N | N | N | N | |
| LUI | Y | N | N | N | N | N | N | N | N | A2_IMM | A1_ZERO | IMM_U | DW_XPR | FN_ADD | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| ADDI | Y | N | N | N | N | N | N | Y | N | A2_IMM | A1_RS1 | IMM_I | DW_XPR | FN_ADD | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| SLTI | Y | N | N | N | N | N | N | Y | N | A2_IMM | A1_RS1 | IMM_I | DW_XPR | FN_SLT | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| SLTIU | Y | N | N | N | N | N | N | Y | N | A2_IMM | A1_RS1 | IMM_I | DW_XPR | FN_SLTU | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| ANDI | Y | N | N | N | N | N | N | Y | N | A2_IMM | A1_RS1 | IMM_I | DW_XPR | FN_AND | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| ORI | Y | N | N | N | N | N | N | Y | N | A2_IMM | A1_RS1 | IMM_I | DW_XPR | FN_OR | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| XORI | Y | N | N | N | N | N | N | Y | N | A2_IMM | A1_RS1 | IMM_I | DW_XPR | FN_XOR | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| ADD | Y | N | N | N | N | N | Y | Y | N | A2_RS2 | A1_RS1 | IMM_X | DW_XPR | FN_ADD | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| SUB | Y | N | N | N | N | N | Y | Y | N | A2_RS2 | A1_RS1 | IMM_X | DW_XPR | FN_SUB | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| SLT | Y | N | N | N | N | N | Y | Y | N | A2_RS2 | A1_RS1 | IMM_X | DW_XPR | FN_SLT | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| SLTU | Y | N | N | N | N | N | Y | Y | N | A2_RS2 | A1_RS1 | IMM_X | DW_XPR | FN_SLTU | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| AND | Y | N | N | N | N | N | Y | Y | N | A2_RS2 | A1_RS1 | IMM_X | DW_XPR | FN_AND | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| OR | Y | N | N | N | N | N | Y | Y | N | A2_RS2 | A1_RS1 | IMM_X | DW_XPR | FN_OR | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| XOR | Y | N | N | N | N | N | Y | Y | N | A2_RS2 | A1_RS1 | IMM_X | DW_XPR | FN_XOR | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| SLL | Y | N | N | N | N | N | Y | Y | N | A2_RS2 | A1_RS1 | IMM_X | DW_XPR | FN_SL | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| SRL | Y | N | N | N | N | N | Y | Y | N | A2_RS2 | A1_RS1 | IMM_X | DW_XPR | FN_SR | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| SRA | Y | N | N | N | N | N | Y | Y | N | A2_RS2 | A1_RS1 | IMM_X | DW_XPR | FN_SRA | N | M_X | N | N | N | N | N | N | Y | CSR.N | N | N | N | N | |
| FENCE | Y | N | N | N | N | N | N | N | N | A2_X | A1_X | IMM_X | DW_X | FN_X | N | M_X | N | N | N | N | N | N | N | CSR.N | N | Y | N | N | |
| SCALL | Y | N | N | N | N | N | N | X | N | A2_X | A1_X | IMM_X | DW_X | FN_X | N | M_X | N | N | N | N | N | N | N | CSR.I | N | N | N | N | |
| SBREAK | Y | N | N | N | N | N | N | X | N | A2_X | A1_X | IMM_X | DW_X | FN_X | N | M_X | N | N | N | N | N | N | N | CSR.I | N | N | N | N | |
| MRET | Y | N | N | N | N | N | N | X | N | A2_X | A1_X | IMM_X | DW_X | FN_X | N | M_X | N | N | N | N | N | N | N | CSR.I | N | N | N | N | |
| WFI | Y | N | N | N | N | N | N | X | N | A2_X | A1_X | IMM_X | DW_X | FN_X | N | M_X | N | N | N | N | N | N | N | CSR.I | N | N | N | N | |
| CEASE | Y | N | N | N | N | N | N | X | N | A2_X | A1_X | IMM_X | DW_X | FN_X | N | M_X | N | N | N | N | N | N | N | CSR.I | N | N | N | N | |
| CSRRW | Y | N | N | N | N | N | N | Y | N | A2_ZERO | A1_RS1 | IMM_X | DW_XPR | FN_ADD | N | M_X | N | N | N | N | N | N | Y | CSR.W | N | N | N | N | |
| CSRRS | Y | N | N | N | N | N | N | Y | N | A2_ZERO | A1_RS1 | IMM_X | DW_XPR | FN_ADD | N | M_X | N | N | N | N | N | N | Y | CSR.S | N | N | N | N | |
| CSRRC | Y | N | N | N | N | N | N | Y | N | A2_ZERO | A1_RS1 | IMM_X | DW_XPR | FN_ADD | N | M_X | N | N | N | N | N | N | Y | CSR.C | N | N | N | N | |
| CSRRWI | Y | N | N | N | N | N | N | N | N | A2_IMM | A1_ZERO | IMM_Z | DW_XPR | FN_ADD | N | M_X | N | N | N | N | N | N | Y | CSR.W | N | N | N | N | |
| CSRRSI | Y | N | N | N | N | N | N | N | N | A2_IMM | A1_ZERO | IMM_Z | DW_XPR | FN_ADD | N | M_X | N | N | N | N | N | N | Y | CSR.S | N | N | N | N | |
| CSRRCI | Y | N | N | N | N | N | N | N | N | A2_IMM | A1_ZERO | IMM_Z | DW_XPR | FN_ADD | N | M_X | N | N | N | N | N | N | Y | CSR.C | N | N | N | N |
For the Rocket Chip, we can see that due to the modular structure of the RISC-V ISA, each subset is a new class which is added or not to the CPU thanks to input parameters.
These parameters can be explained as follows:
| Paramètre | Description |
|---|---|
| legal | Valid instruction |
| fp | Floating point |
| rocc | RoCC (Rocket Custom Coprocessor) |
| branch | Branch |
| jal | Jump and link |
| jalr | Jump and link register |
| rxs2 | Instruction using rs2 : BEQ rs1, rs2, imm ou XOR rd, rs1, rs2 |
| rxs1 | Instruction using rs1 : BEQ rs1, rs2, imm ou LB rd, rs1, imm |
| scie | SCIE = Sifive Custom Instruction Extension. Only one instruction use it |
| sel_alu2 | Operand 2 for ALU (if needed) |
| sel_alu1 | Operand 1 for ALU (if needed) |
| sel_imm | Immediate for ALU (if needed) |
| alu_dw | Data width for ALU |
| alu_fn | Function used in ALU |
| mem | Memory-related instructions |
| mem_cmd | Specify a given memory operation (load, store…) |
| rfs1 | Only used in floating-related instructions |
| rfs2 | Only used in floating-related instructions |
| rfs3 | Not used, probably for future use |
| wfd | Used to log some flaoting-related events |
| mul | Multiplication (M) ou something else (N) |
| div | Division (Y) ou multiplication (D) |
| wxd | Usef for logging |
| csr | CSR.N for instructions not related to CSR |
| fence_i | For the fence_i instruction_ |
| fence | For the fence instruction |
| amo | Atomic extension |
| dp | Double-precision extension |
Playing with the data width of load/store instructions
Instructions encoding
In order to add an instruction in the Rocket CPU, two things must be done:
-
Adding the instruction encoding: example of the LB instruction
-
Adding the parameter dictionary values: example fo the LB instruction
The new instruction can be added in an existing dictionary or by creating a new dictionary as in [2]. In this article, let’s say we want to add a custom LR load instruction.
// Instructions.scala
def LB1 = BitPat("b?????????????????000?????0011111")
// IDecode.scala
LB1-> List(Y,N,N,N,N,N,N,Y,N,A2_IMM, A1_RS1, IMM_I, DW_XPR,FN_ADD, Y,M_XRD, N,N,N,N,N,N,Y,CSR.N,N,N,N,N),Data width for load instructions
Similar assumptions can be made with store instructions.
Load instructions are specified with the same set of parameters rocket-chip/IDecode.scala at dc74f388124704e0838377fe94074175300aff19 · QDucasse/rocket-chip · GitHub. Therefore, how can we find the data width for the load ?
The size is given here
ex_reg_mem_size := Mux(usingHypervisor && id_system_insn, id_inst(0)(27, 26), id_inst(0)(13, 12))val id_inst = id_expanded_inst.map(_.bits) // id_inst is a binary version of the instruction
ex_reg_mem_size := Mux(usingHypervisor && id_system_insn, id_inst(0)(27, 26), id_inst(0)(13, 12)) // Execute stage: size is given in 13,12 bits of the instruction
mem_reg_mem_size := ex_reg_mem_size // Memory stage: size is transmitted here
wb_reg_mem_size := mem_reg_mem_size // Writeback stage: size is transmitted hereThen, we can confirm by having a look at instruction encoding of load instructions:

-
Bits 13 and 12 (low bits of the
funct3field) are different for each load: it specify the data width as shown in previous code snippets. -
Bit 14 is related to unsigned loads:
io.dmem.req.bits.signed := !Mux(ex_reg_hls, ex_reg_inst(20), ex_reg_inst(14))