在不使用乘法器的情况下，以2 ^ 8为基础加速大型模块化乘法

Question

我目前正在将nacl库转换为risc-v。我已经有poly1305工作。我正在尝试使用risc-v核心指令集来执行此操作，因此我没有乘法器。 Pol1305的算法正在使用ceil（m / 16）* 17 * 17 8位乘法，其中m是消息长度（以字节为单位）（两个2 ^ 130整数乘以2 ^ 8以2 ^ 130-5为基数）。因此，我想使用快速乘法算法来保持快速。

目前，我有用于乘法的移位加法算法。但是，对于8位值，这需要63个周期，因为我需要避免分支（定时侧通道），因此涉及一些需要更多周期的屏蔽。

    andi  t2, t0, 1     //t0 is the multiplier
    sub   t2, zero, t2  //creating a mask
    and   t3, t1, t2    //applying the mask to the multiplicand
    add   a0, a0, t3    //doing the add
    srli  t0, t0, 1     //shifting the multiplier
    slli  t1, t1, 1     //shifting the multiplicand

这给我有效的结果，每个乘法有63个周期。问题在于，对于131字节的消息，程序的总执行时间为175219个周期。此时，将9 * 17 * 17 * 63 = 163863个周期用于乘法。我想改进。

Answer 1

这里有一些改进的代码。这是基于Patterson和Hennessy教科书中显示的算法。

    // initialization
    add   a0, zero, t0  //copy multiplier to the product
    slli  t1, t1, 8     //shift the multiplicand to the left half of a 16bit register

    // repeat 8 times from here
    andi  t2, a0, 1     //the right half of a0 is the multiplier
    sub   t2, zero, t2  //creating a mask
    and   t3, t1, t2    //applying the mask to the multiplicand
    add   a0, a0, t3    //doing the add to the left half of the product
    srli  a0, a0, 1     //shifting the product
    // to here

此外，您可以通过重复上述代码8次而不是按分支/跳转来循环来应用循环展开方法。

在算法级别，唐津算法可以减少多精度算术中8位乘法的数量。

在不使用乘法器的情况下，以2 ^ 8为基础加速大型模块化乘法

问题描述投票：5回答：1

1个回答

最新问题

在不使用乘法器的情况下，以2 ^ 8为基础加速大型模块化乘法

问题描述 投票：5回答：1

1个回答

最新问题

问题描述投票：5回答：1