我正在尝试使用
mpi4py
进行测试,以确保在我的整体代码结构中实际实现它之前,我了解如何使用它。我有以下代码:
# mpirun -n 5 python3 MPI_function.py
import numpy as np
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
np.random.seed(0)
N = 21345 # arbitrary
values = np.random.rand(N, 1000)
if rank == 0:
section = np.zeros(int(N / size + N % size))
start = 0
else: # rank != 0:
section = np.zeros(int(N / size))
start = rank * len(section) + N % size
for i in range(len(section)):
section[i] = np.mean(values[start + i])
# print(start + i)
if rank != 0:
comm.Send(section, dest = 0, tag = 14)
else: # rank == 0:
results = np.pad(section, (0, N - len(section)), constant_values = 0)
for r in range(1, size):
temp = np.zeros(N % size)
comm.Recv(temp, source = r, tag = 14)
start = r * len(N % size) + N % size
for i in range(N % size):
results[start + i] = temp[i]
不幸的是,我得到以下错误,这对我来说似乎没有多大意义:
Traceback (most recent call last):
File "[FILEPATH]/MPI_function.py", line 33, in <module>
comm.Recv(temp, source = r, tag = 14)
File "mpi4py/MPI/Comm.pyx", line 299, in mpi4py.MPI.Comm.Recv
mpi4py.MPI.Exception: Message truncated, error stack:
internal_Recv(127).......: MPI_Recv(buf=0x13be04970, count=0, MPI_DOUBLE, 1, 14, MPI_COMM_WORLD, status=0x1) failed
MPIDIG_recv_type_init(77): Message from rank 1 and tag 14 truncated; 0 bytes received but buffer size is 34152
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 31516 RUNNING AT Jacob-Ivanovs-MacBook-Air.local
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed: 9 (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
以下帖子(Message Truncated in MPI_Recv)表明传入消息顺序可能是原因,但考虑到这是令人尴尬的并行,并且不依赖于传入顺序,我不确定这在这种情况下是否有意义。因此,我不确定发生了什么,并且会感谢任何人可能提出的任何建议或想法。谢谢!
事实证明,这个错误是由于不良的睡眠习惯导致的几个菜鸟错误。感谢@petschge 最初发现这些错误。 @Gilles Gouaillardet,我在打字的时候你也发现了一个。
最后一段应该读
if rank != 0:
comm.Send(section, dest = 0, tag = 14)
else: # rank == 0:
results = np.pad(section, (0, N - len(section)), constant_values = 0)
for r in range(1, size):
temp = np.zeros(int(N / size)) # NOT np.zeros(N % size)
comm.Recv(temp, source = r, tag = 14)
start = r * int(N / size) + N % size # NOT r * len(N % size) + N % size
for i in range(int(N / size)): # NOT range(N % size)
results[start + i] = temp[i]
我希望这会鼓励一些未来的程序员仔细检查您自己的代码中是否有任何类似的错误。