Gcc Peephole2
Table of Contents
1. GCC Peephole2
peephole 优化是一种和 arch 相关的 rtl 优化, 它相当于针对 insn patten 的查找替换过程. 例如,假设 arch 支持 inc 指令, 则可以通过一个 peephole 把 `add a0, a0, 1` 替换成 `inc a0`
1.1. define_peephole2
define_peephole2 的格式大约是:
(define_peephole2 [insn-pattern-1 insn-pattern-2 ...] "condition" [new-insn-pattern-1 new-insn-pattern-2 ...] "preparation-statements")
即扫描一个原始的 insn-pattern, 满足一定 condition 后替换成 new-insn, 替换时可以通过 `preparation-statements` 中的 c 代码对原来的数据做一些处理.
Backlinks
machine desc (GCC Backend > insn selection > machine desc > md 语法 > define_peephole2): define_peephole2
1.2. example
假设希望把 `x<<2` 通过 peephole 修改成 `x*4`
int foo(int x) { int y = 8 * x; return y; }
编译后其 rtl 为:
$> /opt/riscv/bin/riscv64-unknown-linux-gnu-gcc $PWD/test.c -O2 -c -fdump-rtl-peephole2 $> cat test.c.304r.peephole2 ... (insn 6 7 13 2 (set (reg:DI 10 a0) (sign_extend:DI (ashift:SI (reg:SI 10 a0 [76]) (const_int 3 [0x3])))) "/home/sunway/download/b/test.c":3:9 168 {*ashlsi3_extend} (nil)) ...
希望通过 peephole 把这个 `ashift 3` 修改成 `mult 8`.
但有一个问题:
riscv 的 mul 是 r 指令, 即两个参数都是 reg, 所以 peephole 感觉应该需要写成类似这样:
(define_peephole2 [ (set (match_operand:DI 0 "register_operand") (sign_extend:DI (ashift:SI (match_operand:SI 1 "register_operand") (match_operand 2 "const_int_operand")))) ] "" [ (set (match_operand:DI 3 "register_operand") (sign_extend:DI (match_operand 1))) (set (match_operand:DI 4 "register_operand") (match_operand 2)) (set (match_dup 0) (mult:DI (match_dup 3) (match_dup 4))) ] { operands[2] = GEN_INT (1<<(INTVAL (operands[2]))); })
但是 define_peephole2 并不支持在 new-insn 部分通过 match_operand `分配` 新的 register, 因为 pass_peephole2 发生成 pass_reload 之后, 即已经完成了 RA (reg allocation), 不能再 `分配` 新的 pseudo register.
1.2.1. 使用物理寄存器
虽然无法通过 match_operand 分配 pseudo register, 但通过 `(reg:DI …)` 直接使用物理寄存器是可以工作的:
(define_peephole2 [ (set (match_operand:DI 0 "register_operand") (sign_extend:DI (ashift:SI (match_operand:SI 1 "register_operand") (match_operand 2 "const_int_operand")))) ] "" [ (set (reg:DI 28) (sign_extend:DI (match_operand 1))) (set (reg:DI 29) (match_operand 2)) (set (match_dup 0) (mult:DI (reg:DI 28) (reg:DI 29)))] { operands[2] = GEN_INT (1<<(INTVAL (operands[2]))); })
转换后的结果为:
$> cat test.c.304r.peephole2 ... (insn 24 7 25 2 (set (reg:DI 28 t3) (sign_extend:DI (reg:SI 10 a0 [76]))) "/home/sunway/download/b/test.c":3:9 -1 (nil)) (insn 25 24 26 2 (set (reg:DI 29 t4) (const_int 8 [0x8])) "/home/sunway/download/b/test.c":3:9 -1 (nil)) (insn 26 25 13 2 (set (reg:DI 10 a0) (mult:DI (reg:DI 28 t3) (reg:DI 29 t4))) "/home/sunway/download/b/test.c":3:9 -1 (nil)) ... $> /opt/riscv/bin/riscv64-unknown-linux-gnu-objdump -d ./test.o 0000000000000000 <foo>: 0: 00050e1b sext.w t3,a0 4: 4ea1 li t4,8 6: 03de0533 mul a0,t3,t4 a: 8082 ret
但这个方案实际上并不可行, 因为不能在代码中 hardcode 物理寄存器 (t3, t4)
1.2.2. 使用 match_scratch
修改生成 ashift 的 define_insn, 让它通过 match_scratch `预留` 一个 scratch reg, 然后让 peephole 使用这个 scratch reg
修改 md:
riscv.md: (define_insn "<optab>si3" [ (parallel [ # NOTE: 加入一个 match_scratch (match_scratch:SI 3 "=r") (set (match_operand:SI 0 "register_operand" "= r") (any_shift:SI (match_operand:SI 1 "register_operand" " r") (match_operand:QI 2 "arith_operand" " rI")))]) ] "" { if (GET_CODE (operands[2]) == CONST_INT) operands[2] = GEN_INT (INTVAL (operands[2]) & (GET_MODE_BITSIZE (SImode) - 1)); return TARGET_64BIT ? "<insn>%i2w\t%0,%1,%2" : "<insn>%i2\t%0,%1,%2"; } [(set_attr "type" "shift") (set_attr "mode" "SI")]) peephole.md (define_peephole2 [ (parallel [ (match_scratch:SI 3) (set (match_operand:SI 0 "register_operand") (ashift:SI (match_operand:SI 1 "register_operand") (match_operand 2 "const_int_operand"))) ]) ] "" [ (set (match_dup 3) (match_operand 2)) (set (match_dup 0) (mult:SI (match_dup 1) (match_dup 3))) ] { operands[2] = GEN_INT (1<<(INTVAL (operands[2]))); } )
生成的 rtl:
$> /opt/riscv/bin/riscv64-unknown-linux-gnu-gcc test.c -O2 -fdump-rtl-all -c $> cat test.c.302r.compgotos ... (insn 6 7 12 2 (parallel [ # match_scratch 分配的新寄存器 (reg:SI 15 a5 [77]) (set (reg:SI 10 a0 [orig:74 y ] [74]) (ashift:SI (reg:SI 10 a0 [76]) (const_int 3 [0x3]))) ]) "test.c":3:9 150 {ashlsi3} (expr_list:REG_DEAD (reg:SI 15 a5 [77]) (nil))) ... $> cat test.c.304r.peephole2 ... (insn 24 7 25 2 (set (reg:SI 15 a5 [77]) (const_int 8 [0x8])) "test.c":3:9 -1 (nil)) (insn 25 24 12 2 (set (reg:SI 10 a0 [orig:74 y ] [74]) (mult:SI (reg:SI 10 a0 [76]) (reg:SI 15 a5 [77]))) "test.c":3:9 -1 (nil)) ...
Backlinks
GCC Backend (GCC Backend > rtl optimization): - pass_peephole2
rtl pass (GCC Pass > rtl pass > pass_peephole2): pass_peephole2