mpirun在程序完成后挂起

问题描述 投票:0回答:1

当我运行以下命令时,我得到了预期的输出,但是程序没有立即终止。

$ mpirun -np 2 echo 1
1
1

程序也不响应中断。大约一分钟后,我回到了外壳上。

或者换句话说:程序mpirun -np 2 echo 1; echo 'done'成功运行,但要花很多时间。

更新:我跑了strace mpirun -np 2 echo 1

程序挂在这里:

sysinfo({uptime=5064793, loads=[153856, 184128, 229600], totalram=67362279424, freeram=26006364160, sharedram=8040448, bufferram=1739857920, totalswap=34359734272, freeswap=34358018048, procs=309, totalhigh=0, freehigh=0, mem_unit=1}) = 0
uname({sysname="Linux", nodename="euler", ...}) = 0
ioctl(13, _IOC(0, 0, 0x25, 0)

然后是这里:

openat(AT_FDCWD, "/tmp/openmpi-sessions-216211@euler_0/42701", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)
munmap(0x7f61ed88c000, 2127408)         = 0
munmap(0x7f61ee0a1000, 2101720)         = 0
close(9)                                = 0
munmap(0x7f61ede9e000, 2105664)         = 0
munmap(0x7f61ed685000, 2122480)         = 0
munmap(0x7f61eda95000, 2109856)         = 0
munmap(0x7f61ed47c000, 2130304)         = 0
munmap(0x7f61ed05b000, 2109896)         = 0
munmap(0x7f61ecc9a000, 3934648)         = 0
munmap(0x7f61ed25f000, 2212016)         = 0
munmap(0x7f61ec8e3000, 3894144)         = 0
munmap(0x7f61ec6bd000, 2248968)         = 0
munmap(0x7f61ea776000, 28999696)        = 0
munmap(0x7f61edc99000, 2110072)         = 0
exit_group(0)                           = ?

您能帮我进一步调试吗?

mpi openmpi
1个回答
0
投票

显然,NVIDIA驱动程序已损坏。将驱动程序更新为440.64.00解决了该问题。

© www.soinside.com 2019 - 2024. All rights reserved.