如何让这个程序集分支逻辑发挥作用？

Question

我一直在尝试在 ARMv8 汇编中实现乘法程序。我已经能够让骨架正常工作，但由于某种原因它无法正常繁殖。

    .initial:      .string "multiplier = 0x%08x (%d) multiplicand = 0x%08x (%d)\n\n"
    .product1:     .string "product = 0x%08x multiplier = 0x%08x\n\n"
    .result1:      .string "64-bit result = 0x%016llx (%lld)\n\n" // Corrected format specifier

    define(FALSE, 0)
    define(TRUE, 1)
    define(multiplier, w18)
    define(multiplicand, w19)
    define(product, w20)
    define(i, w21)
    define(negative, w22)
    define(result, x18)
    define(temp1, x19)
    define(temp2, x20)
    define(product64, x21)
    define(multiplier64, x22)
    define(multiplicand64, x23)

    .balign 4
    .global main
    .text

main:
    stp     x29, x30, [sp, -16]!
    mov     x29, sp

    mov     multiplicand, -30
    mov     multiplier, 70
    mov     product, 0
    mov     i, 0
print1:
    adrp    x0, .initial
    add     x0, x0, :lo12:.initial
    mov     w1, multiplier
    mov     w2, multiplier
    mov     w3, multiplicand
    mov     w4, multiplicand
    bl      printf

multiplier_check:
    cmp     multiplier, 0
    b.ge    Loop1
    mov     negative, TRUE
Loop1:
    add     i, i, 1
    cmp     i, 32
    b.gt    end
    tst     multiplier, 0x1
    b.eq    Loop3
    asr     multiplier, multiplier, 1

Loop3:
    add     product, product, multiplicand

Loop2:
    tst     product, 0x1
    b.ne    Loop4
    orr     multiplier, multiplier, 0x80000000
Loop4:
    and     multiplier, multiplier, 0x7FFFFFFF
Loop5:
    asr     product, product, 1

negative_check:
    cmp     negative, TRUE
    b.eq    Loop6
    b       print2
Loop6:
    sub     product, product, multiplicand
    b Loop5 // Corrected loop condition

print2:
    adrp    x0, .product1
    add     x0, x0, :lo12:.product1
    mov     w1, product
    mov     w2, multiplier
    bl      printf

    sxtw    product64, product
    sxtw    multiplier64, multiplier
    mul     result, product64, multiplier64 // Corrected multiplication for 64-bit result

    adrp    x0, .result1
    add     x0, x0, :lo12:.result1
    mov     x1, result
    mov     x2, result
    bl      printf
end:
    mov     w0, 0
    ldp     x29, x30, [sp], 16
    ret

通常它应该以 8 位十六进制输出乘积和被乘数以及 64 位结果，但我并没有正确输出它，而是不断得到：

乘数 = 0x00000046 (70) 被乘数 = 0xffffffe2 (-30)

乘积 = 0xfffffff1 乘数 = 0x00000046

64 位结果 = 0xffffffffffffffbe6 (-1050)

为了解决这个问题，我尝试更改分支逻辑以查看出了什么问题，但我不断得到相同的输出。（虽然我现在能够得到正确的标志。）任何人都可以帮我看看逻辑哪里出了问题吗？

编辑：为了澄清起见，我使用这些命令来运行它。

m4 assign2a.asm > assign2a.s
 gcc assign2a.s -o e.o
 ./e.o

Answer 1

我还没有看完你的整个计划，但这里有一些评论。有些肯定是错误（但可能是也可能不是导致您的问题的特定错误），有些只是粗略的编码。

一般编码风格和实践

评论你的代码！注释应该解释，不仅要解释代码在做什么，更重要的是解释为什么。使用比
Loop1
Loop2
等更具描述性的标签名称。当您分支到此标签时会发生什么？相应地命名。
使用调试器单步查看代码实际执行的操作，并随时检查寄存器内容。如果您很好地理解代码，那么您应该始终知道所有寄存器中“应该”的值是什么。当您发现它们与您的预期有所不同时，您就发现了错误。看起来您正在某个类似 aarch64 Unix 的系统上本地构建和运行代码，这意味着您也可以本地运行
gdb
lldb
。
为所有寄存器定义符号名称在某些方面很好，但选择重叠的寄存器是有风险的。（你确实知道，例如，
```
w18
```
是
x18
multiplier
和
```
result
```
彼此冲突，并且写入其中一个会覆盖其他;对于所有其他对来说也是如此。这听起来像是错误的根源。
违反调用约定

printf

或任何其他 C 函数时，可能会被覆盖。

保证函数调用不会破坏 x19-x28，但出于同样的原因，您的函数（即使它是
```
main
```
）也不能破坏它们。因此，推入您在
main
分支逻辑错误

`multiplier_check: cmp multiplier, 0 b.ge Loop1 mov negative, TRUE Loop1:`

negative
寄存器（真的是
w22
```
）在此之前尚未初始化。因此，如果 
```
multiplier
为非负并且采用分支，则
```
negative
```
保留其先前的垃圾值，该值不一定为 0。
在这种情况下（就像代码中的许多其他地方一样），可以用条件选择替换分支。你可以简单地做
cmp multiplier, 0 cset negative, lt
根据条件代码
negative
（
0
```
的补码）是假还是真，将
```
1
寄存器设置为
```
lt
```
或
```
ge
```
。
（实际上，这个可以进一步优化为单个指令
```
lsr negative, multiplier, 31
```
，将
```
multiplier
```
的符号位移到
negative
的低位。）
Loop2: tst product, 0x1 b.ne Loop4 orr multiplier, multiplier, 0x80000000 Loop4: and multiplier, multiplier, 0x7FFFFFFF
这看起来不对。如果未采用分支，则
orr
multiplier
```
 的符号位，但随后您会转到 
```
and
指令，该指令再次清除该位。
您可以在
```
orr
```
之后放置一个无条件分支来跳过
```
and
```
。然而，整个序列也可以用位域移动来替换：
bfi multiplier, product, 31, 1
将
```
product
```
的低位插入到
```
multiplier
```
的位31中。
```
    
```

如何让这个程序集分支逻辑发挥作用？

问题描述投票：0回答：1

1个回答

一般编码风格和实践

或任何其他 C 函数时，可能会被覆盖。

`multiplier_check: cmp multiplier, 0 b.ge Loop1 mov negative, TRUE Loop1:`

最新问题

如何让这个程序集分支逻辑发挥作用？

问题描述 投票：0回答：1

1个回答

一般编码风格和实践

或任何其他 C 函数时，可能会被覆盖。

multiplier_check: cmp multiplier, 0 b.ge Loop1 mov negative, TRUE Loop1:

最新问题

问题描述投票：0回答：1

`multiplier_check: cmp multiplier, 0 b.ge Loop1 mov negative, TRUE Loop1:`