对MPI_Bcast的第二次呼叫从第一次呼叫发送旧数据

问题描述 投票:1回答:1

我遇到了一个非常奇怪的错误,其中调用MPI_Bcast发出了错误的值。当我检查在根进程中发送的值时,它会打印出正确的值,但是在所有其他任务中,它会打印出旧值。

我曾尝试查找类似的问题,但所有结果都是由于人们要么不对所有任务调用Bcast,要么尝试在更适合聚会的地方使用它。

由于Bcast没有发送正确的数据,我的下层任务最终陷入了无限循环。使用^ C可以使其正确退出,但是我需要实现代码以自行退出。 (如果Bcast会表现出来,应该这样做)

代码的简化版:

include statements
int main(int argc, char *argv[])
{
variable declarations here
 MPI is initialized up here 
        if (rank ==0)
        {
                input = fopen(argv[1],"r");
                output = fopen("fakeOutput.txt", "w+");
                while(1)
                {
                        count = 1; //reset counter
                        if(exitFlag)  break; 
                        command = (char*)calloc(89,sizeof(char));
                        command[0] = '.';
                        command[1] = '/';
                        batLine = (char*)calloc(86,sizeof(char));
                        for(i=0; i < 16; i++) 
                        {
                                if(fgets(batLine,86,input) != NULL)
                                {
                                        if(loopCount> 0)
                                        {
                                                Continue = true; 
                                             MPI_Bcast(&Continue,1,MPI_C_BOOL,0,MPI_COMM_WORLD);
                                        }
                                        if(i==0)
                                        {
                                                strcat(command,batLine);
                                                printf("rank0 gets: %s\n", command);
                                                fflush(stdout);
                                        }
                                        else
                                        { 
                                                MPI_Send(batLine,85,MPI_CHAR,i,i,MPI_COMM_WORLD);
                                                printf("sent rank%d: %s\n",i,batLine);
                                                fflush(stdout);
                                                count++;
                                         }
                                else
                                {
                                        Continue = false;
                                        exitFlag = true; //flag to break out of while loop
                                        free(batLine);
                                        batLine = (char*)calloc(86,sizeof(char));
                                        batLine[0]='e';
                                        MPI_Send(batLine,85,MPI_CHAR,i,i,MPI_COMM_WORLD);
                                 }
                        }
                        free(batLine);
                        //system(command);      ///to run batch file line
                        delay(500); //to simulate time of the command running
                        MPI_Barrier(MPI_COMM_WORLD); 
                        fprintf(output,"%s", command); //rank 0 has first spectrum
                        free(command);
                        outputFile = (char*)calloc(33,sizeof(char));
                        for (i=1; i<count;i++) //task 0 doesn't send data. have to start at 1
                        {
                                MPI_Recv(outputFile,33,MPI_CHAR,i,16,MPI_COMM_WORLD,&stat2);
                                printf("rank 0 recieved data from %d\n",stat2.MPI_SOURCE);
                                fflush(stdout);
                                fprintf(output,"%s\n",outputFile);
                                printf("Data:%s\n",outputFile);
                        }
                        MPI_Barrier(MPI_COMM_WORLD);
                        printf("continue after barrier:%d\n",Continue);
                        free(outputFile);
                        loopCount ++;
                        if(exitFlag)
                        {
                                Continue = false;
                                MPI_Bcast(&Continue,1,MPI_C_BOOL,0,MPI_COMM_WORLD);
                                printf("sent:%d\n", Continue);
                                break;
                        }
                fclose(input);
                fclose(output);
                printf("files closed\n");

        }
        else
        {
            while(1)
            {
                command = (char*)calloc(89,sizeof(char));
                sentbatch = (char*)calloc(86,sizeof(char));
                spectrum = (char*)calloc(33, sizeof(char));
                command[0] = '.';
                command[1] = '/';
                MPI_Recv(sentbatch,86,MPI_CHAR,0,rank,MPI_COMM_WORLD,&stat);
                printf("rank%d was sent data from%d\n",rank,stat.MPI_SOURCE);
                fflush(stdout);
                if(strncmp(sentbatch,"e",1) == 0)
                {
                        noSend = true; //don't want to send back the place holder data
                }
                strcat(command,sentbatch); //adds needed ./ before batch data
                 free(sentbatch); //don't want to waste memory space
                //system(command); //should run batch line
                Switch statement to give different delay times to different tasks here
                MPI_Barrier(MPI_COMM_WORLD);
                 if(noSend == false)
                {
                        for(i=0; i<31; i++)
                        {
                                spectrum[i] = command[i+13];
                        }
                        free(command);
                        fflush(stdout);
                        printf("sending:%s\n",spectrum);
                        fflush(stdout);
                        MPI_Send(spectrum, 33,MPI_CHAR,0,16,MPI_COMM_WORLD);
                        free(spectrum);
                }
                MPI_Barrier(MPI_COMM_WORLD);
                MPI_Bcast(&lowContinue,1,MPI_C_BOOL,0,MPI_COMM_WORLD);
                fflush(stdout);
                printf("continue: %d\n",lowContinue);
                if(lowContinue == false)
                        break;
            }
           printf("end for rank%d \n", rank);
        }
        MPI_Finalize();
        printf("closed mpi");
        return(0);
}

我正在从腻子窗口中复制代码,因此,如果缺少任何括号,请假定它们存在。它们都在代码本身中匹配,但是从nano和腻子复制是最糟糕的。

我知道它最初看起来可能很奇怪,但是等级0循环在较低等级之前开始,因此对于我正在测试的输入文件,所有任务的BCast最终被调用2次。该代码跨16个任务运行,仅传递26行。

没有错误要显示,但这是打印出来的语句的结尾,删除了重复的行:

rank 0 recieved data from 9
Data:spec-56321-GAC099N59V1_sp01-042
continue after barrier:0
sent:0
continue: 1 (present 15 times)
files closed

fgets正在扫描的文件:

LAMOSTv108 spec-56321-GAC099N59V1_sp01-001.flx spec-56321-GAC099N59V1_sp01-001.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-003.flx spec-56321-GAC099N59V1_sp01-003.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-004.flx spec-56321-GAC099N59V1_sp01-004.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-005.flx spec-56321-GAC099N59V1_sp01-005.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-006.flx spec-56321-GAC099N59V1_sp01-006.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-008.flx spec-56321-GAC099N59V1_sp01-008.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-010.flx spec-56321-GAC099N59V1_sp01-010.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-013.flx spec-56321-GAC099N59V1_sp01-013.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-015.flx spec-56321-GAC099N59V1_sp01-015.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-018.flx spec-56321-GAC099N59V1_sp01-018.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-022.flx spec-56321-GAC099N59V1_sp01-022.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-023.flx spec-56321-GAC099N59V1_sp01-023.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-024.flx spec-56321-GAC099N59V1_sp01-024.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-025.flx spec-56321-GAC099N59V1_sp01-025.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-028.flx spec-56321-GAC099N59V1_sp01-028.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-029.flx spec-56321-GAC099N59V1_sp01-029.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-030.flx spec-56321-GAC099N59V1_sp01-030.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-031.flx spec-56321-GAC099N59V1_sp01-031.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-032.flx spec-56321-GAC099N59V1_sp01-032.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-033.flx spec-56321-GAC099N59V1_sp01-033.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-035.flx spec-56321-GAC099N59V1_sp01-035.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-037.flx spec-56321-GAC099N59V1_sp01-037.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-038.flx spec-56321-GAC099N59V1_sp01-038.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-039.flx spec-56321-GAC099N59V1_sp01-039.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-040.flx spec-56321-GAC099N59V1_sp01-040.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-042.flx spec-56321-GAC099N59V1_sp01-042.nor f

而且,是的,即使从技术上讲我也不应该将Continue用作变量。它用大写字母表示,所以它不同于continue,并且因为我喜欢使用具有逻辑意义的变量名。 (或者我最终对所有内容都使用了最笨拙的名称)

c raspberry-pi mpi
1个回答
0
投票

您将MPI_Bcast用于有条件块中的根进程(rank = 0)以及其他没有任何条件的块。因此,如果根MPI_Bcast不执行while循环的任何迭代,则其他处理器将永远等待。

© www.soinside.com 2019 - 2024. All rights reserved.