Gcc Peephole2

Table of Contents

1. GCC Peephole2

peephole 优化是一种和 arch 相关的 rtl 优化, 它相当于针对 insn patten 的查找替换过程. 例如,假设 arch 支持 inc 指令, 则可以通过一个 peephole 把 `add a0, a0, 1` 替换成 `inc a0`

1.1. define_peephole2

define_peephole2 的格式大约是:

(define_peephole2
       [insn-pattern-1
        insn-pattern-2
        ...]
       "condition"
       [new-insn-pattern-1
        new-insn-pattern-2
        ...]
       "preparation-statements")

即扫描一个原始的 insn-pattern, 满足一定 condition 后替换成 new-insn, 替换时可以通过 `preparation-statements` 中的 c 代码对原来的数据做一些处理.

Backlinks

machine desc (GCC Backend > insn selection > machine desc > md 语法 > define_peephole2): define_peephole2

1.2. example

假设希望把 `x<<2` 通过 peephole 修改成 `x*4`

int foo(int x) {
    int y = 8 * x;
    return y;
}

编译后其 rtl 为:

$> /opt/riscv/bin/riscv64-unknown-linux-gnu-gcc  $PWD/test.c -O2 -c -fdump-rtl-peephole2
$> cat test.c.304r.peephole2

...
(insn 6 7 13 2 (set (reg:DI 10 a0)
        (sign_extend:DI (ashift:SI (reg:SI 10 a0 [76])
                (const_int 3 [0x3])))) "/home/sunway/download/b/test.c":3:9 168 {*ashlsi3_extend}
     (nil))
...

希望通过 peephole 把这个 `ashift 3` 修改成 `mult 8`.

但有一个问题:

riscv 的 mul 是 r 指令, 即两个参数都是 reg, 所以 peephole 感觉应该需要写成类似这样:

(define_peephole2
[
    (set (match_operand:DI 0 "register_operand") 
    (sign_extend:DI (ashift:SI (match_operand:SI 1 "register_operand") 
    (match_operand 2 "const_int_operand"))))
]

"" 
[
    (set (match_operand:DI 3 "register_operand") (sign_extend:DI (match_operand 1)))
    (set (match_operand:DI 4 "register_operand") (match_operand 2)) 
    (set (match_dup 0) (mult:DI (match_dup 3) (match_dup 4)))
]

{
    operands[2] = GEN_INT (1<<(INTVAL (operands[2]))); 
})

但是 define_peephole2 并不支持在 new-insn 部分通过 match_operand `分配` 新的 register, 因为 pass_peephole2 发生成 pass_reload 之后, 即已经完成了 RA (reg allocation), 不能再 `分配` 新的 pseudo register.

1.2.1. 使用物理寄存器

虽然无法通过 match_operand 分配 pseudo register, 但通过 `(reg:DI …)` 直接使用物理寄存器是可以工作的:

(define_peephole2
[
    (set (match_operand:DI 0 "register_operand") 
    (sign_extend:DI (ashift:SI (match_operand:SI 1 "register_operand") 
    (match_operand 2 "const_int_operand"))))
]

"" 
[
    (set (reg:DI 28) (sign_extend:DI (match_operand 1)))
    (set (reg:DI 29) (match_operand 2)) 
    (set (match_dup 0) (mult:DI (reg:DI 28) (reg:DI 29)))]

{
    operands[2] = GEN_INT (1<<(INTVAL (operands[2]))); 
})

转换后的结果为:

$> cat test.c.304r.peephole2

...
(insn 24 7 25 2 (set (reg:DI 28 t3)
        (sign_extend:DI (reg:SI 10 a0 [76]))) "/home/sunway/download/b/test.c":3:9 -1
     (nil))
(insn 25 24 26 2 (set (reg:DI 29 t4)
        (const_int 8 [0x8])) "/home/sunway/download/b/test.c":3:9 -1
     (nil))
(insn 26 25 13 2 (set (reg:DI 10 a0)
        (mult:DI (reg:DI 28 t3)
            (reg:DI 29 t4))) "/home/sunway/download/b/test.c":3:9 -1
     (nil))
...

$> /opt/riscv/bin/riscv64-unknown-linux-gnu-objdump -d ./test.o

0000000000000000 <foo>:
   0:   00050e1b                sext.w  t3,a0
   4:   4ea1                    li      t4,8
   6:   03de0533                mul     a0,t3,t4
   a:   8082                    ret

但这个方案实际上并不可行, 因为不能在代码中 hardcode 物理寄存器 (t3, t4)

1.2.2. 使用 match_scratch

修改生成 ashift 的 define_insn, 让它通过 match_scratch `预留` 一个 scratch reg, 然后让 peephole 使用这个 scratch reg

修改 md:

riscv.md:

(define_insn "<optab>si3"
  [
  (parallel [
  # NOTE: 加入一个 match_scratch
  (match_scratch:SI 3 "=r")
  (set (match_operand:SI     0 "register_operand" "= r")
    (any_shift:SI
        (match_operand:SI 1 "register_operand" "  r")
        (match_operand:QI 2 "arith_operand"    " rI")))])
  ]
  ""
{
  if (GET_CODE (operands[2]) == CONST_INT)
    operands[2] = GEN_INT (INTVAL (operands[2])
               & (GET_MODE_BITSIZE (SImode) - 1));

  return TARGET_64BIT ? "<insn>%i2w\t%0,%1,%2" : "<insn>%i2\t%0,%1,%2";
}
  [(set_attr "type" "shift")
   (set_attr "mode" "SI")])

peephole.md

(define_peephole2
[
  (parallel [
    (match_scratch:SI 3)
    (set (match_operand:SI 0 "register_operand") 
      (ashift:SI (match_operand:SI 1 "register_operand") 
      (match_operand 2 "const_int_operand")))
  ])
]

"" 
[
  (set (match_dup 3) (match_operand 2))
  (set (match_dup 0) (mult:SI (match_dup 1) (match_dup 3)))
]

{
    operands[2] = GEN_INT (1<<(INTVAL (operands[2]))); 
}
)

生成的 rtl:

$> /opt/riscv/bin/riscv64-unknown-linux-gnu-gcc  test.c -O2 -fdump-rtl-all -c
$> cat test.c.302r.compgotos
...
(insn 6 7 12 2 (parallel [
            # match_scratch 分配的新寄存器
            (reg:SI 15 a5 [77])
            (set (reg:SI 10 a0 [orig:74 y ] [74])
                (ashift:SI (reg:SI 10 a0 [76])
                    (const_int 3 [0x3])))
        ]) "test.c":3:9 150 {ashlsi3}
     (expr_list:REG_DEAD (reg:SI 15 a5 [77])
        (nil)))
...

$> cat test.c.304r.peephole2
...
(insn 24 7 25 2 (set (reg:SI 15 a5 [77])
        (const_int 8 [0x8])) "test.c":3:9 -1
     (nil))
(insn 25 24 12 2 (set (reg:SI 10 a0 [orig:74 y ] [74])
        (mult:SI (reg:SI 10 a0 [76])
            (reg:SI 15 a5 [77]))) "test.c":3:9 -1
     (nil))
...

Backlinks

GCC Backend (GCC Backend > rtl optimization): - pass_peephole2

rtl pass (GCC Pass > rtl pass > pass_peephole2): pass_peephole2

Author: [email protected]
Date: 2022-04-22 Fri 19:58
Last updated: 2022-04-24 Sun 14:21

知识共享许可协议