使用sub / cmp / setbe反向工程asm回到C？我的尝试是编译分支

Question

这是我应该翻译的汇编代码：f1：

subl    $97, %edi
xorl    %eax, %eax
cmpb    $25, %dil
setbe   %al
ret

继承了我写的c代码，我认为是相同的。

int f1(int y){

  int x = y-97;
  int i = 0;

  if(x<=25){
    x = i;
  }
  return x;
}

以及我从编译C代码得到的东西。

_f1：## @ f1

.cfi_startproc

％bb.0：

pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq    %rsp, %rbp
.cfi_def_cfa_register %rbp
                  ## kill: def %edi killed %edi def %rdi
leal    -97(%rdi), %ecx
xorl    %eax, %eax
cmpl    $123, %edi
cmovgel %ecx, %eax
popq    %rbp
retq
.cfi_endproc

我想知道这是否正确/应该有什么不同，如果有人可以帮助解释jmps如何工作，因为我也试图翻译这个汇编代码并且已经卡住了f2：

cmpl    $1, %edi
jle .L6
movl    $2, %edx
movl    $1, %eax
jmp .L5

.L8：

movl    %ecx, %edx

.L5：

imull   %edx, %eax
leal    1(%rdx), %ecx
cmpl    %eax, %edi
jg  .L8

.L4：

cmpl    %edi, %eax
sete    %al
movzbl  %al, %eax
ret

.L6：

movl    $1, %eax
jmp .L4

Answer 1

对于使用unsigned-compare技巧编写范围检查的方式，gcc8.3 -O3在问题中完全发出asm。

int is_ascii_lowercase_v2(int y){
    unsigned char x = y-'a';
    return x <= (unsigned)('z'-'a');
}

在int减法后缩小到8位更准确地匹配asm，但它不是正确性所必需的，甚至不能说服编译器使用32位sub。对于unsigned char y，允许RDI的高位字节保存任意垃圾（x86-64 System V调用约定），但是携带仅通过sub和add从低到高传播。

结果的低8位（这是所有cmp读数）与sub $'a', %dil或sub $'a', %edi相同。

将其写为正常范围检查也会使gcc发出相同的代码，因为编译器知道如何优化范围检查。（并且gcc选择使用32位操作数大小的sub，不像使用8位的clang。）

int is_ascii_lowercase_v3(char y){
    return (y>='a' && y<='z');
}

On the Godbolt compiler explorer，这和_v2编译如下：

## gcc8.3 -O3
is_ascii_lowercase_v3:    # and _v2 is identical
    subl    $97, %edi
    xorl    %eax, %eax
    cmpb    $25, %dil
    setbe   %al
    ret

将比较结果作为整数返回，而不是使用if，更自然地匹配asm。

但即使在C语言中“无分支”地编写它也不会与asm匹配，除非您启用优化。来自gcc / clang的默认代码是-O0：反优化以实现一致的调试，在语句之间存储/重新加载内存。（和函数入口上的函数args。）你需要优化，因为-O0 code-gen（故意）主要是脑死亡，而且看起来很讨厌。见How to remove "noise" from GCC/clang assembly output?

## gcc8.3 -O0
is_ascii_lowercase_v2:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    %edi, -20(%rbp)
    movl    -20(%rbp), %eax
    subl    $97, %eax
    movb    %al, -1(%rbp)
    cmpb    $25, -1(%rbp)
    setbe   %al
    movzbl  %al, %eax
    popq    %rbp
    ret

启用优化的gcc和clang将在无效代码转换为无分支代码时执行。例如

int is_ascii_lowercase_branchy(char y){
    unsigned char x = y-'a';
    if (x < 25U) { 
        return 1;
    }
    return 0;
}

仍然用GCC8.3 -O3编译成相同的asm

is_ascii_lowercase_branchy:
    subl    $97, %edi
    xorl    %eax, %eax
    cmpb    $25, %dil
    setbe   %al
    ret

我们可以说优化级别至少是gcc -O2。在-O1，gcc在setbe之前使用效率较低的setbe / movzx而不是xor-zeroing EAX

is_ascii_lowercase_v2:
    subl    $97, %edi
    cmpb    $25, %dil
    setbe   %al
    movzbl  %al, %eax
    ret

我永远无法让clang重现完全相同的指令序列。它喜欢使用add $-97, %edi，而cmp则使用$26 / setb。

或者它会做这样的非常有趣（但次优）的事情：

# clang7.0 -O3
is_ascii_lowercase_v2:
    addl    $159, %edi    # 256-97 = 8-bit version of -97
    andl    $254, %edi    # 0xFE; I haven't figured out why it's clearing the low bit as well as the high bits
    xorl    %eax, %eax
    cmpl    $26, %edi
    setb    %al
    retq

所以这涉及-(x-97)，可能在那里使用2的补充身份（-x = ~x + 1）。

Answer 2

这是程序集的注释版本：

# %edi is the first argument, we denote x
subl $97, %edi
# x -= 97

# %eax is the return value, we denote y
xorl %eax, %eax
# y = 0

# %dil is the least significant byte (lsb) of x
cmpb $25, %dil

# %al is lsb(y) which is already zeroed
setbe %al
# if lsb(x) <= 25 then lsb(y) = 1
# setbe is unsigned version, setle would be signed

ret
# return y

所以一个详细的C等价物是：

int f(int x) {
  int y = 0;
  x -= 97;
  x &= 0xFF; // x = lsb(x) using 0xFF as a bitmask
  y = (unsigned)x <= 25; // Section 6.5.8 of C standard: comparisons yield 0 or 1
  return y;
}

我们可以通过意识到y是不必要的来缩短它：

int f(int x) {
  x -= 97;
  x &= 0xFF;
  return (unsigned)x <= 25;
}

这个装配完全匹配Godbolt Compiler Explorer（x86-64 gcc8.2 -O2）：https://godbolt.org/z/fQ0LVR

使用sub / cmp / setbe反向工程asm回到C？我的尝试是编译分支

问题描述投票：3回答：2

％bb.0：

2个回答

最新问题

使用sub / cmp / setbe反向工程asm回到C？我的尝试是编译分支

问题描述 投票：3回答：2

％bb.0：

2个回答

最新问题

问题描述投票：3回答：2