(C++) 为什么犰狳 `eigs_gen` 在多线程时中断?

问题描述 投票:0回答:0

TLDR

C++ 犰狳库函数用于查找复杂稀疏矩阵的特征值和特征向量

eigs_gen
如果在多个线程中同时运行,即使这些线程不从相同的数据读取或写入,也会失败(崩溃或给出错误的结果)。为什么会这样,它可以多线程运行吗?

最小示例

考虑以下代码,它使用 Armadillo 库在两个不同的线程中用 C++ 查找多个(这里是 2 个)大型复方矩阵 (~20000x20000) 的最低特征值和对应的特征向量。线程不共享任何数据,因此不需要互斥量,即便如此,这个例子在实际使矩阵对角化的函数之前放置了一个互斥量:

#include<iostream>
#include<complex>
#include<exception>
#include <armadillo>
#include<thread>
#include<string>

using namespace std;
using namespace arma;


std::mutex stupid_lock;



void get_eigenvalue(int id, size_t N, double* out)
{
    sp_cx_mat H (N, N);//Sparse matrix, as we have mostly 0 entries, initialized as only zeros by default

    //As a demonstration, a hermitian matrix, which should have real eigenvalues
    for (uint i = 0; i <N; ++i)
    {
        H(i, i)=1*i;

        if (i>0)
            H(i, i-1)=-1i*double(i);
        if (i<N-1)
            H(i, i+1)=+1i*double(i);
    }


    //Somewhere to put the eigenvalues and vectors
    cx_vec eigval;
    cx_mat eigvec;

    stupid_lock.lock();
    //It makes no sense that I need thism everything was created locally in this function
    eigs_gen( eigval, eigvec, H, 1, "sr");//The lowest eigenvalue
    stupid_lock.unlock();

    *out = eigval[0].real();//Should be real, as the matrix is hermitian

    if ( eigval[0].imag()> 1e-9 )//Sanity check, if this is not real something has failed (in practice allow for slight rounding errors)
    {
        throw std::runtime_error("ERROR in thread "+to_string(id)+", energy "+to_string(eigval[0].real())+"+"+to_string(eigval[0].imag())+"*i + has non-zero imaginary part");//I don't even bother catching the possible error, since it should be mathematically impossible for it to trigger
    }
}

int main()
{
    size_t N = 20000;//The error only happens if the functions take a substantial amount of time, this takes around half a second to diagonalise on my computer
    double out1=0;
    double out2=0;


    thread t1(get_eigenvalue,1,N,&out1);
    thread t2(get_eigenvalue,2,N,&out2);

    t1.join();
    t2.join();
    cout<<"These numbers should be the same "<<out1<<" "<<out2<<endl;

    return 0;
}

注意这个例子中使用的矩阵是厄密矩阵,所以我们期望打印的特征值是实数,如果不是这样,程序会抛出异常:这个程序的输出正是你所期望的

$ bin/minimum_failure.exe                                                                                             
These numbers should be the same -19936.6 -19936.6

这每次都很完美。

但是这里需要很长时间的一件事是运行

eigs_gen
,所以让一个线程等待另一个线程完成几乎和根本没有使用线程一样慢,并且线程不会写入相同的数据,所以我看不出互斥量应该在那里的原因,唉,注释掉互斥量
\\stupid_lock.lock();
\\stupid_lock.unlock();
会导致程序以大量不同的方式失败,最常见的是:

$ bin/minimum_failure.exe                                                                                             
[1]    540245 segmentation fault (core dumped)  bin/minimum_failure.exe                                               

$ bin/minimum_failure.exe                                                                                             
warning: eigs_gen(): ARPACK error 
warning: eigs_gen(): ARPACK error -14 in neupd()-15
 in neupd()
[1]    541755 segmentation fault (core dumped)  bin/minimum_failure.exe

但有时线程“成功”但找到的特征值是错误的(并且每次都是随机的)有时它们是真实的但错误的:

$ bin/minimum_failure.exe     
These numbers should be the same -4.78507e+16 -3.69731e+12

有时厄密矩阵具有复杂的特征值(这在数学上是不可能的!):

$ bin/minimum_failure.exe                                                                                             
terminate called after throwing an instance of 'std::runtime_error'                                                   
  what():  ERROR in thread 2, energy -16346042235.187666+52718696738.912788*i + has non-zero imaginary part           
[1]    540310 IOT instruction (core dumped)  bin/minimum_failure.exe                                                  

这显然不是舍入错误,这是一个完全错误的结果。

但是为什么会这样呢?只有当我们从不同线程读取或写入相同数据时才需要互斥锁,而这些线程不会这样做。幕后的犰狳是否使用某种全局范围变量?并且在多个不同的线程中同时使用矩阵的特征值分解是不是不可能?

编译器、库和系统

带有 Linux 6.1.15-1-lts 的 Distro Arch Linux

犰狳 12.0.1-1

还有提供 blas 3.9.0 的 openblas 0.3.21-4

海湾合作委员会 12.2.1-2

我正在用标志编译:

-std=c++17 -O2 -larmadillo  -lpthread

我的 CPU 有 20 个线程的空间(通过在 C++ 中调用

uint64_t max_threads = thread::hardware_concurrency();
或在终端中调用
lscpu
找到)

c++ multithreading armadillo
© www.soinside.com 2019 - 2024. All rights reserved.