Gcc Peephole2
Table of Contents
1. GCC Peephole2
peephole 优化是一种和 arch 相关的 rtl 优化, 它相当于针对 insn patten 的查找替换过程. 例如,假设 arch 支持 inc 指令, 则可以通过一个 peephole 把 `add a0, a0, 1` 替换成 `inc a0`
1.1. define_peephole2
define_peephole2 的格式大约是:
(define_peephole2
[insn-pattern-1
insn-pattern-2
...]
"condition"
[new-insn-pattern-1
new-insn-pattern-2
...]
"preparation-statements")
即扫描一个原始的 insn-pattern, 满足一定 condition 后替换成 new-insn, 替换时可以通过 `preparation-statements` 中的 c 代码对原来的数据做一些处理.
Backlinks
machine desc (GCC Backend > insn selection > machine desc > md 语法 > define_peephole2): define_peephole2
1.2. example
假设希望把 `x<<2` 通过 peephole 修改成 `x*4`
int foo(int x) { int y = 8 * x; return y; }
编译后其 rtl 为:
$> /opt/riscv/bin/riscv64-unknown-linux-gnu-gcc $PWD/test.c -O2 -c -fdump-rtl-peephole2
$> cat test.c.304r.peephole2
...
(insn 6 7 13 2 (set (reg:DI 10 a0)
(sign_extend:DI (ashift:SI (reg:SI 10 a0 [76])
(const_int 3 [0x3])))) "/home/sunway/download/b/test.c":3:9 168 {*ashlsi3_extend}
(nil))
...
希望通过 peephole 把这个 `ashift 3` 修改成 `mult 8`.
但有一个问题:
riscv 的 mul 是 r 指令, 即两个参数都是 reg, 所以 peephole 感觉应该需要写成类似这样:
(define_peephole2
[
(set (match_operand:DI 0 "register_operand")
(sign_extend:DI (ashift:SI (match_operand:SI 1 "register_operand")
(match_operand 2 "const_int_operand"))))
]
""
[
(set (match_operand:DI 3 "register_operand") (sign_extend:DI (match_operand 1)))
(set (match_operand:DI 4 "register_operand") (match_operand 2))
(set (match_dup 0) (mult:DI (match_dup 3) (match_dup 4)))
]
{
operands[2] = GEN_INT (1<<(INTVAL (operands[2])));
})
但是 define_peephole2 并不支持在 new-insn 部分通过 match_operand `分配` 新的 register, 因为 pass_peephole2 发生成 pass_reload 之后, 即已经完成了 RA (reg allocation), 不能再 `分配` 新的 pseudo register.
1.2.1. 使用物理寄存器
虽然无法通过 match_operand 分配 pseudo register, 但通过 `(reg:DI …)` 直接使用物理寄存器是可以工作的:
(define_peephole2
[
(set (match_operand:DI 0 "register_operand")
(sign_extend:DI (ashift:SI (match_operand:SI 1 "register_operand")
(match_operand 2 "const_int_operand"))))
]
""
[
(set (reg:DI 28) (sign_extend:DI (match_operand 1)))
(set (reg:DI 29) (match_operand 2))
(set (match_dup 0) (mult:DI (reg:DI 28) (reg:DI 29)))]
{
operands[2] = GEN_INT (1<<(INTVAL (operands[2])));
})
转换后的结果为:
$> cat test.c.304r.peephole2
...
(insn 24 7 25 2 (set (reg:DI 28 t3)
(sign_extend:DI (reg:SI 10 a0 [76]))) "/home/sunway/download/b/test.c":3:9 -1
(nil))
(insn 25 24 26 2 (set (reg:DI 29 t4)
(const_int 8 [0x8])) "/home/sunway/download/b/test.c":3:9 -1
(nil))
(insn 26 25 13 2 (set (reg:DI 10 a0)
(mult:DI (reg:DI 28 t3)
(reg:DI 29 t4))) "/home/sunway/download/b/test.c":3:9 -1
(nil))
...
$> /opt/riscv/bin/riscv64-unknown-linux-gnu-objdump -d ./test.o
0000000000000000 <foo>:
0: 00050e1b sext.w t3,a0
4: 4ea1 li t4,8
6: 03de0533 mul a0,t3,t4
a: 8082 ret
但这个方案实际上并不可行, 因为不能在代码中 hardcode 物理寄存器 (t3, t4)
1.2.2. 使用 match_scratch
修改生成 ashift 的 define_insn, 让它通过 match_scratch `预留` 一个 scratch reg, 然后让 peephole 使用这个 scratch reg
修改 md:
riscv.md:
(define_insn "<optab>si3"
[
(parallel [
# NOTE: 加入一个 match_scratch
(match_scratch:SI 3 "=r")
(set (match_operand:SI 0 "register_operand" "= r")
(any_shift:SI
(match_operand:SI 1 "register_operand" " r")
(match_operand:QI 2 "arith_operand" " rI")))])
]
""
{
if (GET_CODE (operands[2]) == CONST_INT)
operands[2] = GEN_INT (INTVAL (operands[2])
& (GET_MODE_BITSIZE (SImode) - 1));
return TARGET_64BIT ? "<insn>%i2w\t%0,%1,%2" : "<insn>%i2\t%0,%1,%2";
}
[(set_attr "type" "shift")
(set_attr "mode" "SI")])
peephole.md
(define_peephole2
[
(parallel [
(match_scratch:SI 3)
(set (match_operand:SI 0 "register_operand")
(ashift:SI (match_operand:SI 1 "register_operand")
(match_operand 2 "const_int_operand")))
])
]
""
[
(set (match_dup 3) (match_operand 2))
(set (match_dup 0) (mult:SI (match_dup 1) (match_dup 3)))
]
{
operands[2] = GEN_INT (1<<(INTVAL (operands[2])));
}
)
生成的 rtl:
$> /opt/riscv/bin/riscv64-unknown-linux-gnu-gcc test.c -O2 -fdump-rtl-all -c
$> cat test.c.302r.compgotos
...
(insn 6 7 12 2 (parallel [
# match_scratch 分配的新寄存器
(reg:SI 15 a5 [77])
(set (reg:SI 10 a0 [orig:74 y ] [74])
(ashift:SI (reg:SI 10 a0 [76])
(const_int 3 [0x3])))
]) "test.c":3:9 150 {ashlsi3}
(expr_list:REG_DEAD (reg:SI 15 a5 [77])
(nil)))
...
$> cat test.c.304r.peephole2
...
(insn 24 7 25 2 (set (reg:SI 15 a5 [77])
(const_int 8 [0x8])) "test.c":3:9 -1
(nil))
(insn 25 24 12 2 (set (reg:SI 10 a0 [orig:74 y ] [74])
(mult:SI (reg:SI 10 a0 [76])
(reg:SI 15 a5 [77]))) "test.c":3:9 -1
(nil))
...
Backlinks
GCC Backend (GCC Backend > rtl optimization): - pass_peephole2
rtl pass (GCC Pass > rtl pass > pass_peephole2): pass_peephole2
