使用 add / jnc 循环计算给定整数的二进制表示中非前导 0 的数量

Question

我是汇编语言的新手，我有以下问题：

int main(){
  int x = 2023; //nine 1 and two 0 as in bin it's - 11111100111
  int y;
  
  asm(
    "mov eax, %1 ;"
    "mov ecx, 32 ;"
    "mov ebx, 0 ;"
    "loop:"
      "add eax, eax ;"
      "jnc next ;"
      "inc ebx ;"
      "next :"
      "dec ecx ;"
      "jnz loop;"
    "mov %0, ebx ;"

    : "=r" (y)
    : "r"(x)
    : "eax", "ebx", "ecx"
  );

代码编译使用

-masm=intel

.

我想让这个程序计算 x 的二进制表示中 0 的个数。现在它可以在其中计数 1 但我无法弄清楚需要更改的内容。我试图改变几个功能，但它要么给我错误的价值，要么价值相同。

并澄清给定的代码。现在它给出的结果很好，因为它在 2023 年计数为 1，但我想以某种方式反转它以使其计数为 0 而不是 1。我通过更改

add

、

inc

、

dec

等尝试了几种方法，但是不知何故结果是 23 或 41（不知道如何）

Answer 1

一个与您已经编写的代码保持接近的解决方案。（显然，您遵循关于“使用 add / jnc 循环”的约束。）

L1循环首先定位EAX寄存器中设置的最高位
L2 循环稍后根据要求计算嵌入的零位

                     ; "mov eax, %1 ;"
    mov  ebx, 0
    mov  ecx, 32
L1: add  eax, eax
    jc   L3          ; Found highest 1 bit
    dec  ecx
    jnz  L1
    jmp  L4          ; EAX was 0
L2: add  eax, eax
    jc   L3
    inc  ebx         ; Count zeroes
L3: dec  ecx
    jnz  L2
L4:                  ; "mov %0, ebx ;"

Answer 2

unsigned int count_nonleading_zero_bits( unsigned int n ) {
   unsigned int count = 0;
   while ( n ) {
      if ( !( n & 1 ) )
         ++count;

      n >>= 1;
   }

   return count;
}

https://godbolt.org/z/asds67nbr

count_nonleading_zero_bits:
        xor     eax, eax
        test    edi, edi
        je      .L5
.L4:
        mov     edx, edi
        and     edx, 1
        cmp     edx, 1
        adc     eax, 0
        shr     edi
        jne     .L4
.L5:
        ret

Answer 3

对于带有 BMI1 的现代 x86，您需要使用

32-popcnt(x)

来计算整个寄存器中的所有零，然后减去

lzcnt(x)

，leading 零的数量，即不属于有效数字的部分你的二进制整数。

在 GNU C 中，看起来像

int count_significant_zeros(unsigned x)
{
    if (!x)
        return 0;  // __builtin_clz has an undefined result for input 0
    
    return sizeof(x)*CHAR_BIT - __builtin_popcount(x) - __builtin_clz(x);
    // assume no padding bits.  Will be 32 when compiling for x86.
    // use popcountll and clzll for 64-bit integers.
}

如果没有

-mpopcnt

或 SSE4.2，GCC 和 clang (Godbolt) 使用 an SWAR bithack for

__builtin_popcount

，以及

bsr(x) ^ 31

作为

31-bsr(x)

前导零计数的优化。 Clang 实际上使用

popcount(~x)

作为

32-popcount(x)

的优化，在 bithack 之前使用

not

指令。做

32 - popcnt(x) - (31-bsr(x))

可能至少和

1 + bsr(x) - popcnt(x)

一样有效。

bsr

使目标寄存器保持不变以输入 0，因此您需要对其进行特殊处理。（AMD 对此进行了记录，英特尔将结果记录为“未定义”，不像

lzcnt

明确定义为产生 32）。无需额外检查即可使用

bsr

是 GNU C 将

__builtin_clz

定义为在这种情况下具有未定义结果的原因。（https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html）

C++20 版本看起来像这样，使用

std::countl_zero

计算左边的零（前导），为输入 0 生成类型宽度，因此我们不需要检查这种特殊情况我们自己。这让编译器在像

lzcnt

这样的指令可用时做出更好的 asm，不需要特殊情况。

#include <bit>
#include <limits>
int count_significant_zeros_cpp20(unsigned x)
{
    return std::numeric_limits<decltype(x)>::digits - std::popcount(x) - std::countl_zero(x);
}

来自 clang 16

-O2 -march=haswell

用于 x86-64：

count_significant_zeros_cpp20(unsigned int):
        popcnt  eax, edi        # missed optimization: false dependency could be avoided with popcnt edi,edi
        lzcnt   ecx, edi        # Haswell also has a false output dependency on lzcnt, although Skylake doesn't.
        add     ecx, eax
        mov     eax, 32
        sub     eax, ecx
        ret

如果你想为此使用内联汇编，你可以这样做

int count_significant_zeros_asm(unsigned x)
{
  int retval, dummy;
  asm(
    "lzcnt  %[dummy], %[x] \n\t"     // false dependency on Intel before Skylake; could xor-zero [dummy] first to break it
    "popcnt %[x], %[x] \n\t"         // avoid false dependencies on Intel before Ice Lake
    "mov    %[ret], 32 \n\t"
    "sub    %[ret], %[dummy] \n\t"   // latency of this can overlap with popcnt latency
    "sub    %[ret], %[x] \n\t"

    : [ret]"=r" (retval), [dummy]"=&r"(dummy),  // let the compiler pick registers, dummy being an early-clobber output written before we read x for the last time.  Although [x] is an RMW inout operand so I don't think it can overlap
      [x]"+r"(x)                               // modify the input reg in place
    : // no pure inputs
    : // no clobbers
  );
  return retval;
  // dummy is also available with clz(x) in case you want it.
}

如果你想遍历位，用

shr

右移更有意义；一旦值变为

，您就可以停止循环，就像ikegami 对社区维基答案的重写一样。（这最初只是一个

std::popcount()

计数，甚至没有从 32 倒数到零。你可以用

sbb reg, 0

来做到这一点。）

你用

add

向左移动，这意味着你必须循环遍历固定宽度寄存器中的前导零，然后才能到达整数的前导

。

顺便说一句，clang 14 及更高版本在

-masm=intel

语句中支持

asm("")

for Intel 语法；在此之前，它只影响输出语法，而不影响其内置汇编器如何处理 asm 语句。

Answer 4

备选方案：

unsigned int count_nonleading_zero_bits(unsigned int n) {
   unsigned int count = 0;
   while (n) {
      count += 1 - (n & 1);
      n >>= 1;
   }
   return count;
}

x86_64 组装：

count_nonleading_zero_bits:
        xor     edx, edx
        test    edi, edi
        je      .L1
.L3:
        mov     eax, edi
        not     eax
        and     eax, 1
        add     edx, eax
        shr     edi
        jne     .L3
.L1:
        mov     eax, edx
        ret

手动优化（不一定更快）：

count_nonleading_zero_bits:
        xor     eax, eax   ;; set CF=0
.L3:
        adc     eax, 0
        shr     edi
        cmc
        jne     .L3
        ret

Answer 5

感谢大家不仅提供代码方面的帮助，还提供解释方面的帮助！我真的很感激它。

这是更改后的代码（谢谢 Sep Roland），添加了解释其内部工作的注释。

  asm(
    "mov eax, %1 ;"           // register where binary representation of int x is stored
    "mov ebx, 0 ;"            // register where the result will be stored
    "mov ecx, 32 ;"           // loop counter
    "loop1:"                  // check and move to first one in the binary value
      "add eax, eax ;"        // add value to itself
      "jc loop3 ;"            // find the first one - jump to loop3 to decrease the counter
      "dec ecx ;"             // decrement the counter
      "jnz loop1 ;"           // if counter is not zero, repeat the loop
      "jmp loop4 ;"           // jump to l4 when eax = 0, when the counter has been zeroed (case where there are no 0s in the register)
    "loop2:"                  // main loop checking the number of zeros
      "add eax, eax ;"        // add value to itself
      "jc loop3 ;"            // if carry occurs, jump to loop3
      "inc ebx ;"             // increment the value holding the number of zeros
    "loop3:"                  // check counter status
      "dec ecx ;"             // decrement the counter
      "jnz loop2 ;"           // if counter is not zero, jump to loop2
    "loop4:"
      "mov %0, ebx ;"         // end of operation

    : "=r" (y)
    : "r"(x)
    : "eax", "ebx", "ecx"
  );

为了澄清我为什么选择这个。这是因为长度和易于理解对我来说似乎是最好的。我之前说过，我刚刚开始学习这种语言，它与我习惯的高级语言非常不同。

对不起英语，因为它是我的第二语言。

使用 add / jnc 循环计算给定整数的二进制表示中非前导 0 的数量

问题描述投票：0回答：5

5个回答

最新问题

使用 add / jnc 循环计算给定整数的二进制表示中非前导 0 的数量

问题描述 投票：0回答：5

5个回答

最新问题

问题描述投票：0回答：5