为什么 blas gemm 函数系列中不允许非正向步幅？

Question

sgemm的netlib文档指出，数组步长

LDA

和

LDB

必须是

>= 1

，并且足够大，以便列不会重叠。事实上，Apple 的 Accelerate/veclib 框架中的实现会检查这些条件，如果违反则存在。

我真的不明白为什么会有这个限制。 BLAS 不能简单地相信我真的想要零步幅或负步幅吗？据我了解，Fortran 整数默认是带符号的，因此参数类型似乎不是原因（免责声明：我不太了解 Fortan）。

确实，存在非正数组步幅的非常合理的用例：

零步幅：在多维数组类中，零步幅可以启用 numpy 风格的广播。
负步幅：负步幅允许沿任意轴以相反顺序查看数组，而无需复制。这可能很有用，例如当翻转卷积核时，和卷积可以使用gemm有效地实现。或者，可以翻转图像的垂直轴，这很方便，因为存在不同的约定：在 postscript/pdf 中轴指向上方，在 png 格式（以及许多其他格式）中轴指向下方。

我对两个方面感兴趣：

我想了解为什么存在这种限制。难道真的只是因为 BLAS 的设计者没有考虑到这样的用例吗？我是否是某人试图捕获确实是一项功能的错误的受害者？或者限制会带来更好的性能吗？我很难想象后者。
有没有办法在不牺牲（太多）性能的情况下解决该限制？目前我需要一些可以在 Mac 上使用 C++ 运行的东西，但从长远来看，它仍然应该基于 BLAS，因此它可以跨平台运行。

Answer 1

我最近发现自己有效地做到了这一点：

double s[4] = {10., 2., 3.1, 4.1};
dscal_(4, 3., s, -1);
assert( s[1] == 2.*3. );

dscal_

是最简单的BLAS函数，将数组乘以标量，其签名是：

void sscal(int, double, double*, int); // doesn't seem to return anything useful

在我的 BLAS 特定发行版（Fedora 28 附带）中，这给出了一个无声错误，因为该函数似乎没有执行任何操作。此外，

dscal_

甚至似乎没有返回错误代码，因此如果没有包装函数，我无法捕获此错误（我的数组具有运行时正或负步幅，但我无法在所有情况下控制该值）。

所有这些案例都失败了：

double s[4] = {10., 2., 3.1, 4.1};
dscal_(4, 3., s, -1); // negative incx does nothing
dscal_(4, 3., &s[4], -1); // negative incx does nothing
dscal_(-4, 3., &s[4], 1); // negative n does nothing
dscal_(-4, 3., &s[4], -1); // negative n or incx does nothing
assert( s[1] == 2. );

这告诉我，虽然可能没有在任何地方记录，但步幅 (

incx

) 必须是正值（还有大小）。幸运的是，对于许多 BLAS 函数来说，调用可以转换为正步幅。我需要一个包装函数来以任何方式调试它，所以编写了以下包装函数：

void signed_dscal(int n, double alpha, double* x, int incx){
    int prod = incx*n;
    if(prod > 0) dscal(abs(n), alpha, x, abs(incx));
    else         dscal(abs(n), alpha, x + prod, abs(incx));
}

通过这种方式，我可以用正或负的步幅和大小来调用

signed_dscal

。

int main(){
{
    double d[4] = {10., 2., 3.1, 4.1};
    signed_dscal(4, 3., d, 1);
    assert( d[1] == 6. );
}
{
    double d[4] = {10., 2., 3.1, 4.1};
    signed_dscal(4, 3., &d[4], -1);
    assert( d[1] == 6. );
}
{
    double d[4] = {10., 2., 3.1, 4.1};
    signed_dscal(-4, 3., &d[4], 1);
    assert( d[1] == 6. );
}
{
    double d[4] = {10., 2., 3.1, 4.1};
    signed_dscal(-4, 3., d, -1);
    assert( d[1] == 6. );
}    
    return 0;
}

（注意

incx=0

仍然是无法修改的情况。）

我还是不明白这背后的逻辑是什么。也许 BLAS 的某些实现将允许您默认执行此操作，而其他实现将尝试防止无限循环，其副作用是假设正步幅值。

Intel BLAS 似乎暗示它支持负步幅，因为它提到

abs(incx)

，至少对于 AXPY：https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-参考-c/top/blas-and-sparse-blas-routines/blas-routines/blas-level-1-routines-and-functions/cblas-axpy.html

我没有尝试，是否适用于其他功能还有待观察。

Answer 2

关于零步幅的情况：我正在使用一个 C++ 数组库（免责声明：我自己的），它支持广播并且还支持 BLAS 接口。我可以快速尝试很多情况，看看哪些 BLAS 版本支持零步幅。

这是

axpy

的示例，在Linux中至少我可用的所有BLAS版本都支持零步幅（广播数组）。

#include<multi/array.hpp>  // from https://gitlab.com/correaa/boost-multi
#include<multi/adaptors/blas.hpp>

namespace multi = boost::multi;

int main() {
    multi::array<double, 1> y = {1.0, 2.0, 3.0};

    // regular arrays, stride != 0
    multi::array<double, 1> const x = {1.0, 1.0, 1.0};

    multi::blas::axpy(1.0, x.begin(), y);

    assert(y[0] == 2.0);
    assert(y[1] == 3.0);
    assert(y[2] == 4.0);

    // broadcasted array, stride == 0
    multi::array<double, 0> const x0(1.0);

    multi::blas::axpy(1.0, x0.broadcasted().begin(), y);

    assert(y[0] == 3.0);
    assert(y[1] == 4.0);
    assert(y[2] == 5.0);
}

该程序可以在所有 BLAS、openBLAS 和 MKL 上正确编译和运行。（不幸的是，我没有苹果设备可以尝试。）好消息是这些库都没有检查零步幅，至少对于

axpy

。

c++ axpy.cpp -Iinclude/ `pkg-config --libs openblas` && ./a.out
c++ axpy.cpp -Iinclude/ `pkg-config --libs mkl` && ./a.out
c++ axpy.cpp -Iinclude/ `pkg-config --libs mkl-static-ilp64-seq` && ./a.out

为什么 blas gemm 函数系列中不允许非正向步幅？

问题描述投票：0回答：2

2个回答

最新问题

为什么 blas gemm 函数系列中不允许非正向步幅？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2