RISC-V Toolchain Patch

1. RISC-V Toolchain Patch
- 1.1. T-Head

1. RISC-V Toolchain Patch

1.1. T-Head

1.1.1. gcc

https://github.com/T-head-Semi/gcc/commit/f342d033dbaa0a748c4867b1edb7d97dddd71873

针对 gcc 的修改主要包括:

定义指令的 latency 和 pipeline 信息
p 扩展
v 扩展
zfh 扩展
t-head 自定义扩展

其中:

p/v 扩展主要是实现 intrinsic
zfh 扩展没有对应的 intrinsic, 因为 gcc 本来就支持 `__fp16` 类型及 `HF` mode, zfh 只需要定义 md, 例如标准的 F 扩展要定义 addsf3, zfh 只需要定义 addhf3 即可
t-head 自定义扩展, 参考 https://github.com/T-head-Semi/thead-extension-spec/blob/master/intro.adoc 和 https://occ-oss-prod.oss-cn-hangzhou.aliyuncs.com/resource/undefined/1623288429903/%E7%8E%84%E9%93%81E906R2S0%E7%94%A8%E6%88%B7%E6%89%8B%E5%86%8C_v03.pdf, 主要包括 mac (multiply-accumulate), vdot (vector-dot), bb (bit-manipulation) 等

1.1.1.1. by file

去掉脚本生成的代码以及测试代码, 剩下大约 20K 行代码, 一半为脚本, 一半为 md

2396 files changed, 361667 insertions(+), 301 deletions(-)
gcc/combine.c                                      |      5 +
gcc/common/config/riscv/riscv-common.c             |    112 +-
---解析新增的 march 字符串
gcc/config.gcc                                     |      8 +-
gcc/config/riscv/c906.md                           |    156 +
---添加了 c906 的 latency 和 pipepline 信息

gcc/config/riscv/constraints.md                    |      2 +-
gcc/config/riscv/predicates.md                     |      4 +
gcc/config/riscv/riscv-builtins-p.def              |    513 +
---p 扩展的 builtin 声明, 脚本生成

gcc/config/riscv/riscv-builtins-thead.c            |    388 +
gcc/config/riscv/riscv-builtins-thead.h            |     20 +
gcc/config/riscv/riscv-builtins-v-a.def            |    121 +
gcc/config/riscv/riscv-builtins-v.def              |  22362 +++
---v 扩展的 builtin 声明

gcc/config/riscv/riscv-builtins.c                  |     67 +-
---init/expand builtin

gcc/config/riscv/riscv-c.c                         |     30 +
---t-head 新增的预定义宏, 例如 __riscv_xthead

gcc/config/riscv/riscv-cores.def                   |      9 +
---t-head 新增的 cpu, arch, tune 的声明

gcc/config/riscv/riscv-dsp.h                       |   4686 +
---这个文件是用 riscv-p-builtins-gen.py 生成的, 给应用程序使用

gcc/config/riscv/riscv-fp16.md                     |    342 +
---zfh 扩展对应的 md

gcc/config/riscv/riscv-ftypes-p.def                |    114 +
gcc/config/riscv/riscv-ftypes-special.def          |    332 +
gcc/config/riscv/riscv-ftypes-v.def                |   5252 +
---p/v 扩展对应的 builtin function prototyple 声明, 脚本生成

gcc/config/riscv/riscv-modes.def                   |     26 +
gcc/config/riscv/riscv-opts.h                      |     14 +-
gcc/config/riscv/riscv-p-builtins-gen.py           |   4684 +
gcc/config/riscv/riscv-p.md                        |   3591 +
---p 扩展对应的 md

gcc/config/riscv/riscv-passes-thead.def            |      2 +
gcc/config/riscv/riscv-protos.h                    |      4 +
gcc/config/riscv/riscv-seg-modes.def               |     77 +
gcc/config/riscv/riscv-thead-dfsrm.c               |     92 +
---自定义 rtl pass

gcc/config/riscv/riscv-thead-dsext.c               |    673 +
---自定义 rtl pass, 它会把多余的 sext 去掉, 例如 a+(b<<c) 时会有多余的
sext 指令生成, 会导致 addsl 无法匹配

gcc/config/riscv/riscv-thead-tune.h                |     53 +
--thead tune 定义

gcc/config/riscv/riscv-thead.c                     |   1188 +
gcc/config/riscv/riscv-thead.h                     |    173 +
gcc/config/riscv/riscv-thead.md                    |   1081 +
---thead 自定义扩展对应的 md

gcc/config/riscv/riscv-v-auto.md                   |    530 +
gcc/config/riscv/riscv-v-builtins-gen.py           |   4029 +
gcc/config/riscv/riscv-v-float.md                  |   2621 +
gcc/config/riscv/riscv-v-iterators.md              |   2459 +
gcc/config/riscv/riscv-v-mem.md                    |    838 +
gcc/config/riscv/riscv-v-seg-iterators.md          |    110 +
gcc/config/riscv/riscv-v-segmem.md                 |    751 +
gcc/config/riscv/riscv-v.h                         |     81 +
gcc/config/riscv/riscv-v.md                        |   3940 +
---v 扩展对应的 md

gcc/config/riscv/riscv-vector-seg-type.def         |    121 +
gcc/config/riscv/riscv-vector-type.def             |    120 +
gcc/config/riscv/riscv-vector.h                    | 174973 ++++++++++++++++++
---脚本生成, 给用户程序使用

gcc/config/riscv/riscv.c                           |    459 +-
---新的寄存器, tune info, cost

gcc/config/riscv/riscv.h                           |    135 +-
gcc/config/riscv/riscv.md                          |    182 +-
gcc/config/riscv/riscv.opt                         |    162 +-
gcc/genemit.c                                      |    104 +-
gcc/ipa-inline.c                                   |     11 +-
gcc/loop-iv.c                                      |     22 +-
gcc/testsuite/gcc.target/riscv/dsp/add16.c         |     14 +
gcc/testsuite/gcc.target/riscv/dsp/add32.c         |     14 +
gcc/testsuite/gcc.target/riscv/dsp/add8.c          |     14 +
---p/v 指令对应的 dg 测试代码, 由脚本生成

1.1.1.2. by insn

1.1.1.2.1. p extension

riscv-builtins-p.def 使用 DIRECT_BUILTIN 宏声明 builtin:

DIRECT_BUILTIN (zunpkd810_v4qi, RISCV_UV2HI_FTYPE_UV4QI, dsp32),
DIRECT_BUILTIN (zunpkd810_v8qi, RISCV_UV4HI_FTYPE_UV8QI, dsp64),

其中 zunpkd810_v4qi 表示它需要对应 riscv-p.md 中的一条 insn: riscv_zunpkd810_v4qi

(define_insn "riscv_<unpkd_int_str>_<mode>"
  [(set (match_operand:<vqvhmod_attr> 0 "register_operand" "=r")
     (unspec:<vqvhmod_attr>
      [(match_operand:VQIMOD 1 "register_operand" "r")]
       UNSPEC_UNPKD))]
  "TARGET_XTHEAD_DSP"
  "<unpkd_int_insn>\\t%0,%1"
)

其中 unpkd_int_str 是一个 attribute, 包含 zunpkd810, zunpkd820 等不同的值, 以简化 md 的编写

RISCV_UV2HI_FTYPE_UV4QI 代表 function prototype, 指函数有唯一参数为 UV4QI, 即 uint8x4_t, 返回值为 UV2HI, 即 uint16x2_t.

这些 prototype 在 riscv-ftypes-p.def 中定义, 例如

// 展开为 RISCV_UV2HI_FTYPE_UV4QI
DEF_RISCV_FTYPE (1, (UV2HI, UV4QI))

// 展开为 RISCV_DI_FTYPE_DI_DI
DEF_RISCV_FTYPE (2, (DI, DI, DI))

1.1.1.2.2. v extension

v extension 与 p extension 类似, 不同的是 v 使用专门的 v 寄存器 (p 使用 x 寄存器)

因此 v extension 有些代码和寄存器有关, 例如:

riscv_compute_frame_info 是需要考虑是否要保存 v_reg

1.1.1.2.3. zfh extension

zfh 不需要提供 intrinsic, 因为 gcc 本身支持 __fp16, thead 针对 HF mode 定义相应的 md 即可

1.1.1.2.4. thead extsion

https://github.com/T-head-Semi/thead-extension-spec

xheadba

addsl

addsl 是用一条指令实现 a+b<<c 的操作, t-head 实现时只需要定义其 md:

(define_insn "*xthead_addsl<mode>"
  [(set (match_operand:X 0 "register_operand" "=r")
    (plus:X (ashift:X (match_operand:X 2 "register_operand" "r")
              (match_operand:QI 3 "const_twobit_operand" "i"))
        (match_operand:X 1 "register_operand" "r")))]
   "TARGET_XTHEAD_ADDSL"
   "addsl\t%0, %1, %2, %3"
   [(set_attr "type" "arith")]
)

gcc 在 rtl combine pass 里会把两条 rtl (ashift, plus) 合并成一条 (plus(shift)), rtl 在 code emission 时根据 rtl 模板会找到这条指令

xheadbb

ext

ext/extu 是 t-head 提供的加速 sext 和 zext 的指令

(define_insn "*xthead_extend<SHORT:mode><X:mode>2"
  [(set (match_operand:X       0 "register_operand"    "=r")
    (sign_extend:X
      (match_operand:SHORT  1 "register_operand"    "r")))]
  "TARGET_XTHEAD_EXT"
  {
    operands[2] = GEN_INT (GET_MODE_BITSIZE (<SHORT:MODE>mode) - 1);
    return "ext\t%0,%1,%2,0";
  }
  [(set_attr "type" "arith")
   (set_attr "mode" "<X:MODE>")])

同时修改了对应的 cost:

static int riscv_extend_cost(rtx op, bool unsigned_p) {
    if (MEM_P(op)) return 0;

    if (unsigned_p && GET_MODE(op) == QImode) /* We can use ANDI.  */
        return COSTS_N_INSNS(1);

    if (!unsigned_p && GET_MODE(op) == SImode) /* We can use SEXT.W.  */
        return COSTS_N_INSNS(1);

    if (!optimize_size && TARGET_XTHEAD_C) return tune_param->extend;

    /* We need to use a shift left and a shift right.  */
    return COSTS_N_INSNS(2);
}

srri

(define_insn "rotrsi3"
  [(set (match_operand:SI              0 "register_operand"     "=r")
    (rotatert:SI (match_operand:SI 1 "register_operand"     "r")
             (match_operand:SI 2 "const_int_operand"    "QcL")))]
  "TARGET_XTHEAD_SRRIW || (TARGET_XTHEAD_SRRI && !TARGET_64BIT)"
  {
    return TARGET_XTHEAD_SRRIW ? "srriw\t%0, %1, %2" : "srri\t%0, %1, %2";
  }
  [(set_attr "type" "arith")
   (set_attr "mode" "DI")]
)

gcc 本身已经定义了 rotr_optab, 所以后端只需要定义 rotrsi3 之类即可支持

xheadint

ipush

ipush 用来加速 interrupt handler 的处理, 可以一条指令完成多个 reg 的入栈.

实现上 t-head 定义了其 md:

(define_insn "riscv_ipush"
  [(unspec_volatile [(const_int 0)] UNSPECV_IPUSH)
   (use (reg RETURN_ADDR_REGNUM))
   (use (reg T0_REGNUM))
   (use (reg T1_REGNUM))
   // ...
   (use (reg T4_REGNUM))
   (use (reg T5_REGNUM))
   (use (reg T6_REGNUM))]
  "TARGET_XTHEAD_IPUSH"
  "ipush")

同时修改了负责生成 function prologue 和 epilogue 的部分:

void riscv_expand_prologue(void) {
    // ...
    if (TARGET_XTHEAD_INTERRUPT_HANDLER_P()) {
        rtx dwarf = riscv_adjust_ipush_cfi_prologue();
        frame->mask &= ~frame->imask;
        frame->gp_sp_offset -= frame->save_ipush_adjustment;
        size -= frame->save_ipush_adjustment;
        insn = emit_insn(gen_riscv_ipush());
        // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

        RTX_FRAME_RELATED_P(insn) = 1;
        REG_NOTES(insn) = dwarf;
    }
    // ...
}

xheadmempair

ldd/sdd

(define_insn "load_pairdi"
  [(set (match_operand:DI 0 "register_operand"        "=r")
    (match_operand:DI 1 "riscv_mem_pair_operand"  "Qmp"))
   (set (match_operand:DI 2 "register_operand"        "=r")
    (mem:DI (plus:DI (match_operand:DI 3 "register_operand"      "r")
             (match_operand  4 "const_Pi_operand"     "Pi"))))]
  "TARGET_XTHEAD_LDD
  && rtx_equal_p (plus_constant (Pmode, operands[3], INTVAL (operands[4])),
      plus_constant (Pmode, XEXP (operands[1], 0), GET_MODE_SIZE (DImode)))
  && REGNO (operands[0]) != REGNO (operands[3])
  && REGNO (operands[2]) != REGNO (operands[3])"
  "ldd\t%0,%2,(%3),%j4,4"
  [(set_attr "type" "load")
   (set_attr "mode" "DI")])

对连续内存的两个 ld 可以合并成一个 ldd 指令

xheadmemidx
1. flrd

xheadmac

mula

(define_insn "*xthead_madd<mode>"
  [(set (match_operand:X 0 "register_operand" "=r")
          (plus:X (mult:X (match_operand:X 1 "register_operand" "r")
                  (match_operand:X 2 "register_operand" "r"))
              (match_operand:X 3 "register_operand" "0")))]
  "TARGET_MUL && TARGET_XTHEAD_MULA"
  "mula\\t%0,%1,%2"
  [(set_attr "type" "imul")
   (set_attr "mode" "<MODE>")]
)

REG[rd]+=REG[rs1]*REG[rs2]

1.1.2. gdb

同样要支持 p/v/zfh 和 t-head 自定义扩展, 但修改主要在 opcodes 中, 和 disass 有关. 还有小部分和 gdbarch (tdep) 有关, 例如新增的 v 寄存器信息

1.1.3. qemu

https://github.com/T-head-Semi/qemu/commit/fd25f40f7d66d6fe5caffa9ef4b610434de6cf42#diff-8f4b85f7d35aa3058c602811cdac2dfa6c09fbea8d897de3b2e0c6905be3fbfa

qemu 上游已经支持 v 扩展.

t-head 主要是支持 p/zfh 和 t-head 自定义扩展. 主要修改部分是 QEMU TCG, 包括 insn32.decode, translate.c, xxx_helper.c

1.1.4. spike

没有和 t-head 相关的修改, 看起来只是加了一个 https://github.com/openhwgroup/force-riscv 到 spike