要求更多的人来监督学校项目的实施。这个特定的函数未通过正确性测试(其他一切都通过)。我认为每个排名的开始和结束的逻辑可能是罪魁祸首,但不确定。开始和重新平衡的数组是相等的校验和,这就是它失败的地方,意味着可能存在重叠或损坏。
void rebalance(const dist_sort_t *data, const dist_sort_size_t myDataCount, dist_sort_t **rebalancedData, dist_sort_size_t *rCount) {
// Get number of processes
int nProcs;
MPI_Comm_size(MPI_COMM_WORLD, &nProcs);
// Get rank of the process
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
dist_sort_size_t global_N;
// Perform MPI all reduce to sum up all local_N's and get global_N
MPI_Allreduce(&myDataCount, &global_N, 1, MPI_TYPE_DIST_SORT_SIZE_T, MPI_SUM, MPI_COMM_WORLD);
uint64_t dataPerProc, limit;
// Datacount per process
dataPerProc = global_N / nProcs;
// no of processes with extra data
limit = global_N % nProcs;
//Additional debugging specifically for handling of limit > 0
if (limit > 0)
{
std::cout << "Handling extra data - Rank: " << rank << ", Extra Data Limit: " << limit << std::endl;
}
// assign datasize for curr process //gives the first few ranks the extras
dist_sort_size_t myCount = dataPerProc + (rank < limit ? 1 : 0);
std::cout<< "myCount: " << myCount << std::endl;
// allocate array for output
dist_sort_t *balanced = (dist_sort_t *)malloc(myCount * sizeof(dist_sort_t));
int balanced_size = myCount * sizeof(dist_sort_t);
std::cout<< "balanced_size: " << balanced_size << std::endl;
// Global starting and ending index of this rank
uint64_t myStartGlobal, myEnd;
MPI_Exscan(&myDataCount, &myStartGlobal, 1, MPI_TYPE_DIST_SORT_SIZE_T, MPI_SUM, MPI_COMM_WORLD);
if (rank == 0) {
myStartGlobal = 0;
}
myEnd = myStartGlobal + myDataCount -1;
MPI_Win win;
// create window for one-way communication
MPI_Win_create(balanced, myCount * sizeof(dist_sort_t), sizeof(dist_sort_t), MPI_INFO_NULL, MPI_COMM_WORLD, &win);
MPI_Win_fence(MPI_MODE_NOPRECEDE, win);
uint64_t next = myStartGlobal;
while (next <= myEnd)
{
uint64_t dest;
if (next < global_N)
{
dest = next / dataPerProc; // Calculate which process should receive the 'next' data item
}
else
{
dest = nProcs - 1; // Assign to the last process if 'next' is outside the range of available data items
}
uint64_t disp = next % dataPerProc; // offset in destination rank
uint64_t size = std::min(dataPerProc - disp, myEnd - next + 1); // size to write to destination rank
if (dest < nProcs)
{
// writing to destination rank
MPI_Put(&data[next - myStartGlobal], size, dtype, dest, disp, size, dtype, win);
}
next += size;
}
MPI_Win_fence(MPI_MODE_NOSUCCEED, win);
// assigning rebalanced data
*rebalancedData = balanced;
*rCount = myCount;
}
谢谢!
MPI_Put
不是原子的,因此任何重叠的写入都会被损坏。请使用MPI_Accumulate(..,op=MPI_REPLACE,..)
来实现任何原子写操作。
如果这不能解决问题,请使用更多信息更新问题。