MPI_collective communication

问题描述 投票:0回答:1

我尝试在mpi中编码快速排序。并行化算法很简单。根将列表分散在MPI_comm_world中。然后每个节点在其子数组上执行qsort()函数。 MPI_gathers()用于将所有子数组返回到根目录,以在其上再次执行qsort。如此简单。但是我得到了错误。我猜想也许子数组的大小是不准确的。因为它只是将列表的大小除以comm_size。因此很可能会出现细分错误。但是我给出列表1000的大小和处理器4的数量。除法的结果是250。因此应该没有分段错误。但是还有。你能告诉我我哪里错了。

int main()
{
    int array [1000];
    int arrsize;
    int chunk;
    int* subarray;
    int rank ;
    int comm_size;
    MPI_Init(NULL,NULL);
    MPI_Comm_size(MPI_COMM_WORLD,&comm_size);
    MPI_Comm_rank(MPI_COMM_WORLD,&rank);
    if(rank==0)
    {
        time_t t;
        srand((unsigned)time(&t));
        int arrsize = sizeof(array) / sizeof(int);
        for (int i = 0; i < arrsize; i++)
            array[i] = rand() % 1000;
        printf("\n this is processor %d and the unsorted array is:",rank);
        printArray(array,arrsize);          
    }

    MPI_Scatter( array,arrsize,MPI_INT, subarray,chunk,MPI_INT,0,MPI_COMM_WORLD);
    chunk = (int)(arrsize/comm_size);
    subarray = (int*)calloc(arrsize,sizeof(int));

    if(rank != 0)
    {
        qsort(subarray,chunk,sizeof(int),comparetor);
    }

    MPI_Gather( subarray,chunk, MPI_INT,array, arrsize, MPI_INT,0, MPI_COMM_WORLD);
    if(rank==0)
    {
        qsort(array,arrsize,sizeof(int),comparetor);
        printf("\n this is processor %d and this is sorted array: ",rank);
        printArray(array,arrsize);
    }
    free(subarray);
    MPI_Finalize();
    return 0;
}

并且错误说:

Invalid MIT-MAGIC-COOKIE-1 key[h:04865] *** Process received signal ***
[h:04865] Signal: Segmentation fault (11)
[h:04865] Signal code: Address not mapped (1)
[h:04865] Failing at address: 0x421e45
[h:04865] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x46210)[0x7f1906b29210]
[h:04865] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x18e533)[0x7f1906c71533]
[h:04865] [ 2] /lib/x86_64-linux-gnu/libopen-pal.so.40(+0x4054f)[0x7f190699654f]
[h:04865] [ 3] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_datatype_sndrcv+0x51a)[0x7f1906f3288a]
[h:04865] [ 4] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_coll_base_scatter_intra_basic_linear+0x12c)[0x7f1906f75dec]
[h:04865] [ 5] /lib/x86_64-linux-gnu/libmpi.so.40(PMPI_Scatter+0x10d)[0x7f1906f5952d]
[h:04865] [ 6] ./parallelQuickSortMPI(+0xc8a5)[0x5640c424b8a5]
[h:04865] [ 7] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f1906b0a0b3]
[h:04865] [ 8] ./parallelQuickSortMPI(+0xc64e)[0x5640c424b64e]
[h:04865] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node h exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
c++ c mpi openmpi
1个回答
0
投票

分割错误的原因在以下几行中。

MPI_Scatter( array,arrsize,MPI_INT, subarray,chunk,MPI_INT,0,MPI_COMM_WORLD);
chunk = (int)(arrsize/comm_size);
subarray = (int*)calloc(arrsize,sizeof(int));

您仅在subarray操作之后才分配chunk并计算MPI_Scatter大小。这是一个集体操作,必须在调用之前声明和定义必需的内存分配(例如:接收器数组)以及要接收的大小。

chunk = (int)(arrsize/comm_size);
subarray = (int*)calloc(arrsize,sizeof(int));
MPI_Scatter( array,arrsize,MPI_INT, subarray,chunk,MPI_INT,0,MPI_COMM_WORLD);

以上是正确的方法。您将通过此更改通过分段错误。

© www.soinside.com 2019 - 2024. All rights reserved.