因此,我试图建立一个通信例程,在该例程中,我使用'MPI_Gatherv'将一些2D阵列从许多处理器传输回根。
我一直在尝试使用scatterv和collectv进行此操作,首先给出一个简单的示例;一个4x4数组,我正在尝试将其划分为四个2x2数组,然后进行分散。
我一直在尝试使用Scatterv函数在4个处理器上划分我的4x4。目前,我已经到达了根处理器设法打印其2x2阵列的地步,但是如果下一个处理器尝试打印其本地数据,那么我会遇到段错误,如果我不尝试打印本地阵列,我没有任何错误。这是我的代码:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main (int argc, char** argv) {
int rank, numprocs;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int ndim = 4;
int **ga = NULL;
// Create global array. size is 4x4, Only for Root.
if (rank == 0) {
ga = malloc(ndim*sizeof(int*));
for (int i=0; i < ndim; i++)
ga[i] = malloc(ndim*sizeof(int));
for (int i=0; i < ndim; i++) {
for (int j=0; j < ndim; j++) {
if (i > 0) {
ga[i][j] = i*ndim + j;
} else {
ga[i][j] =i+j;
}
}
}
}
//print global array.
if (rank == 0) {
printf("Send array:\n");
for (int i=0; i < ndim; i++) {
for (int j=0; j < ndim; j++) {
printf(" %d ", ga[i][j]);
}
printf("\n");
}
}
//Create local arrays on all procs.
int **la = NULL;
//local array size is 2x2.
int ndim_loc = ((ndim*ndim)/numprocs)/2;
la = (int**)malloc(ndim_loc*sizeof(int*));
for (int i=0; i< ndim_loc; i++)
la[i] = (int*)malloc(ndim_loc*sizeof(int));
// global size
int sizes[2] = {ndim, ndim};
//local size, 4 procs, ndim = 4. each proc has a 2x2.
int subsizes[2] = {ndim_loc, ndim_loc};
int starts[2] = {0, 0}; //Set starting point of subarray in global array.
if (rank == 0) {
printf("Global arr dims = [%d,%d]\n", sizes[0], sizes[1]);
printf("Sub arr dims = [%d,%d]\n", subsizes[0], subsizes[1]);
printf("start point in global = [%d,%d]\n", starts[0], starts[1]);
}
//Preparing MPI send types.
MPI_Datatype sub_arr, type;
MPI_Type_create_subarray(2, sizes, subsizes, starts, MPI_ORDER_C, MPI_INT, &sub_arr);
MPI_Type_create_resized(sub_arr, 0, 2*sizeof(int), &type);
//Re-sizing block extent from one int to two ints, i.e. 1 block extent is one row of the sub array
MPI_Type_commit(&type);
//Setting up arrays for sendcounts (each processor receives 1 sub array).
int scounts[numprocs];
//Displacements relative to global array[0][0].
// .___.___.___.___.
// |[0]| |[1]| | [i] marks the starting position of the sub array in the global one for processor i.
// |___|___|___|___| So, the displacements (in units of the new block extent) are: {0,1,4,5}
// | | | | |
// |___|___|___|___|
// |[2]| |[3]| |
// |___|___|___|___|
// | | | | |
// |___|___|___|___|
int displs[numprocs];
for (int i=0; i<numprocs; i++) {
scounts[i] = 1;
if (i > 0 && i%2 == 0) {
displs[i] = displs[i-1] + 3;
} else if (i == 0) {
displs[i] = 0;
} else {
displs[i] = displs[i-1] + 1;
}
}
MPI_Barrier(MPI_COMM_WORLD);
printf("I AM RANK %d, displ = %d, scount = %d\n", rank, displs[rank], scounts[rank]);
//Sending uses the newly defined MPI_TYPE, receiving side is 4 MPI_INTs.
MPI_Scatterv(&ga, scounts, displs, type, &la, (ndim_loc*ndim_loc), MPI_INT, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
//print local array.
printf("RANK = %d, local data:\n", rank);
for (int i=0; i<ndim_loc; i++) {
for (int j=0; j<ndim_loc; j++) {
printf(" %d ", la[i][j]);
}
printf("\n");
}
我发现了一些回答如下的问题(例如:sending blocks of 2D array in C using MPI,MPI C - Gather 2d Array Segments into One Global Array),这些问题极大地帮助我了解了实际内存布局中正在发生的事情。但是我似乎无法使这种分散运行,并且我不确定自己在做什么错。
在这些答案之一中,解决方案是向后分配接收处理器的内存。即
int **la = NULL;
int *la_pre = NULL;
int ndim_loc = ((ndim*ndim)/numprocs)/2;
la_pre = malloc((ndim_loc*ndim_loc)*sizeof(int));
la = malloc(ndim_loc*sizeof(int*));
for (int i=0; i<ndim_loc; i++)
la[i] = &(la_pre[i*ndim_loc]);
不幸的是,这似乎不起作用,我得到的输出与以前相同:
mpirun -np 4 ./a.out
Send array:
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
recieve array:
0 0
0 0
Global arr dims = [4,4]
Sub arr dims = [2,2]
start point in global = [0,0]
I AM RANK 0, displ = 0, scount = 1
I AM RANK 1, displ = 1, scount = 1
I AM RANK 2, displ = 4, scount = 1
I AM RANK 3, displ = 5, scount = 1
RANK = 0, local data:
0 1
4 5
RANK = 1, local data:
RANK = 2, local data:
RANK = 3, local data:
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 928 RUNNING AT login03
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Intel(R) MPI Library troubleshooting guide:
https://software.intel.com/node/561764
===================================================================================
对此的任何帮助将不胜感激!
这可能无法完全解决您的问题,但从概念上讲,子数组可能看起来像这样:
//Displacements relative to global array[0][0].
// .___.___.___.___.
// |[0]| |[1]| | [i] marks the starting position of the sub array in the global one for processor i.
// |___|___|___|___| So, the displacements (in units of the new block extent) are: {0,1,4,5}
// | | | | |
// |___|___|___|___|
// |[2]| |[3]| |
// |___|___|___|___|
// | | | | |
// |___|___|___|___|
但是在内存中,它们看起来更像这样:
|[0]| | | |[1]| | | |[2]| | | |[3]| | | |
使依赖于位移的逻辑无效,如布局所示。