使用C MPI的2D阵列的散射和聚集（Scatterv＆Gatherv）部分

Question

因此，我试图建立一个通信例程，在该例程中，我使用'MPI_Gatherv'将一些2D阵列从许多处理器传输回根。

我一直在尝试使用scatterv和collectv进行此操作，首先给出一个简单的示例；一个4x4数组，我正在尝试将其划分为四个2x2数组，然后进行分散。

我一直在尝试使用Scatterv函数在4个处理器上划分我的4x4。目前，我已经到达了根处理器设法打印其2x2阵列的地步，但是如果下一个处理器尝试打印其本地数据，那么我会遇到段错误，如果我不尝试打印本地阵列，我没有任何错误。这是我的代码：

    #include <stdio.h>
    #include <stdlib.h>
    #include <mpi.h>

    int main (int argc, char** argv) {

    int rank, numprocs;

    MPI_Init(&argc, &argv);

    MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    int ndim = 4;
    int **ga = NULL;

    // Create global array. size is 4x4, Only for Root.
    if (rank == 0) {

            ga = malloc(ndim*sizeof(int*));

            for (int i=0; i < ndim; i++)
                    ga[i] = malloc(ndim*sizeof(int));

            for (int i=0; i < ndim; i++) {

                    for (int j=0; j < ndim; j++) {

                            if (i > 0) {
                                    ga[i][j] = i*ndim + j;

                            } else {
                                    ga[i][j] =i+j;

                            }
                    }
            }
    }

    //print global array.
    if (rank == 0) {

            printf("Send array:\n");
            for (int i=0; i < ndim; i++) {

                    for (int j=0; j < ndim; j++) {

                            printf(" %d ", ga[i][j]);
                    }

                    printf("\n");
            }
    }

    //Create local arrays on all procs.
    int **la = NULL;
    //local array size is 2x2.
    int ndim_loc = ((ndim*ndim)/numprocs)/2;

    la = (int**)malloc(ndim_loc*sizeof(int*));
    for (int i=0; i< ndim_loc; i++)
            la[i] = (int*)malloc(ndim_loc*sizeof(int));

    // global size  
    int sizes[2] = {ndim, ndim};

    //local size, 4 procs, ndim = 4. each proc has a 2x2.
    int subsizes[2] = {ndim_loc, ndim_loc};
    int starts[2] = {0, 0};               //Set starting point of subarray in global array.

    if (rank == 0) {

            printf("Global arr dims = [%d,%d]\n", sizes[0], sizes[1]);
            printf("Sub arr dims = [%d,%d]\n", subsizes[0], subsizes[1]);
            printf("start point in global = [%d,%d]\n", starts[0], starts[1]);
    }

    //Preparing MPI send types.
    MPI_Datatype sub_arr, type;
    MPI_Type_create_subarray(2, sizes, subsizes, starts, MPI_ORDER_C, MPI_INT, &sub_arr);
    MPI_Type_create_resized(sub_arr, 0, 2*sizeof(int), &type); 
    //Re-sizing block extent from one int to two ints, i.e. 1 block extent is one row of the sub array       
    MPI_Type_commit(&type);

    //Setting up arrays for sendcounts (each processor receives 1 sub array).
    int scounts[numprocs];

    //Displacements relative to global array[0][0].
    // .___.___.___.___.
    // |[0]|   |[1]|   |  [i] marks the starting position of the sub array in the global one for processor i.
    // |___|___|___|___|  So, the displacements (in units of the new block extent) are: {0,1,4,5}
    // |   |   |   |   |
    // |___|___|___|___|
    // |[2]|   |[3]|   |
    // |___|___|___|___|
    // |   |   |   |   |
    // |___|___|___|___|

    int displs[numprocs];

    for (int i=0; i<numprocs; i++) {

            scounts[i] = 1;

            if (i > 0 && i%2 == 0) {

                    displs[i] = displs[i-1] + 3;

            } else if (i == 0) {

                    displs[i] = 0;

            } else {

                    displs[i] = displs[i-1] + 1;

            }
    }

    MPI_Barrier(MPI_COMM_WORLD);

    printf("I AM RANK %d, displ = %d, scount = %d\n", rank, displs[rank], scounts[rank]);

    //Sending uses the newly defined MPI_TYPE, receiving side is 4 MPI_INTs.
    MPI_Scatterv(&ga, scounts, displs, type, &la, (ndim_loc*ndim_loc), MPI_INT, 0, MPI_COMM_WORLD);

    MPI_Barrier(MPI_COMM_WORLD);

    //print local array.    
    printf("RANK = %d, local data:\n", rank);

    for (int i=0; i<ndim_loc; i++) {

            for (int j=0; j<ndim_loc; j++) {

                    printf("  %d  ", la[i][j]);
            }

            printf("\n");
    }

我发现了一些回答如下的问题（例如：sending blocks of 2D array in C using MPI，MPI C - Gather 2d Array Segments into One Global Array），这些问题极大地帮助我了解了实际内存布局中正在发生的事情。但是我似乎无法使这种分散运行，并且我不确定自己在做什么错。

在这些答案之一中，解决方案是向后分配接收处理器的内存。即

    int **la = NULL;
    int *la_pre = NULL;
    int ndim_loc = ((ndim*ndim)/numprocs)/2;

    la_pre = malloc((ndim_loc*ndim_loc)*sizeof(int));
    la = malloc(ndim_loc*sizeof(int*));

    for (int i=0; i<ndim_loc; i++)
            la[i] = &(la_pre[i*ndim_loc]);

不幸的是，这似乎不起作用，我得到的输出与以前相同：

    mpirun -np 4 ./a.out 
    Send array:
    0  1  2  3 
    4  5  6  7 
    8  9  10  11 
    12  13  14  15 
    recieve array:
    0  0 
    0  0 
    Global arr dims = [4,4]
    Sub arr dims = [2,2]
    start point in global = [0,0]
    I AM RANK 0, displ = 0, scount = 1
    I AM RANK 1, displ = 1, scount = 1
    I AM RANK 2, displ = 4, scount = 1
    I AM RANK 3, displ = 5, scount = 1
    RANK = 0, local data:  
     0   1  
     4   5 
    RANK = 1, local data:
    RANK = 2, local data:
    RANK = 3, local data:

    ===================================================================================
    =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
    =   PID 928 RUNNING AT login03
    =   EXIT CODE: 11
    =   CLEANING UP REMAINING PROCESSES
    =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
    ===================================================================================
       Intel(R) MPI Library troubleshooting guide:
          https://software.intel.com/node/561764
    ===================================================================================

对此的任何帮助将不胜感激！

Answer 1

这可能无法完全解决您的问题，但从概念上讲，子数组可能看起来像这样：

//Displacements relative to global array[0][0].
    // .___.___.___.___.
    // |[0]|   |[1]|   |  [i] marks the starting position of the sub array in the global one for processor i.
    // |___|___|___|___|  So, the displacements (in units of the new block extent) are: {0,1,4,5}
    // |   |   |   |   |
    // |___|___|___|___|
    // |[2]|   |[3]|   |
    // |___|___|___|___|
    // |   |   |   |   |
    // |___|___|___|___|

但是在内存中，它们看起来更像这样：

|[0]| | | |[1]| | | |[2]| | | |[3]| | | |

使依赖于位移的逻辑无效，如布局所示。

使用C MPI的2D阵列的散射和聚集（Scatterv＆Gatherv）部分

问题描述投票：1回答：1

1个回答

最新问题

使用C MPI的2D阵列的散射和聚集（Scatterv＆Gatherv）部分

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1