我正在尝试使用OMP运行矩阵乘法程序。我在串行和并行版本中得到不同的输出。我正在尝试使用3 * 3矩阵进行测试。
我的并行代码是:
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#define NRA 3//62 /* number of rows in matrix A */
#define NCA 3//15 /* number of columns in matrix A */
#define NCB 3//7 /* number of columns in matrix B */
int main (int argc, char *argv[])
{
int tid, nthreads, i, j, k, chunk;
double a[NRA][NCA], /* matrix A to be multiplied */
b[NCA][NCB], /* matrix B to be multiplied */
c[NRA][NCB]; /* result matrix C */
chunk = 10; /* set loop iteration chunk size */
/*** Spawn a parallel region explicitly scoping all variables ***/
#pragma omp parallel shared(a,b,c,nthreads,chunk) private(tid,i,j,k)
{
tid = omp_get_thread_num();
if (tid == 0)
{
nthreads = omp_get_num_threads();
printf("Starting matrix multiple example with %d threads\n",nthreads);
printf("Initializing matrices...\n");
}
/*** Initialize matrices ***/
#pragma omp for schedule (static, chunk)
for (i=0; i<NRA; i++)
for (j=0; j<NCA; j++)
a[i][j]= i+j;
#pragma omp for schedule (static, chunk)
for (i=0; i<NCA; i++)
for (j=0; j<NCB; j++)
b[i][j]= i*j;
#pragma omp for schedule (static, chunk)
for (i=0; i<NRA; i++)
for (j=0; j<NCB; j++)
c[i][j]= 0;
/*** Do matrix multiply sharing iterations on outer loop ***/
/*** Display who does which iterations for demonstration purposes ***/
printf("Thread %d starting matrix multiply...\n",tid);
#pragma omp for schedule (static, chunk)
for (i=0; i<NRA; i++)
{
printf("Thread=%d did row=%d\n",tid,i);
for(j=0; j<NCB; j++)
for (k=0; k<NCA; k++)
c[i][j] += a[i][k] * b[k][j];
}
} /*** End of parallel region ***/
/*** Print results ***/
printf("******************************************************\n");
printf("Result Matrix:\n");
for (i=0; i<NRA; i++)
{
for (j=0; j<NCB; j++)
printf("%6.2f ", a[i][j]);
printf("\n");
}
printf("******************************************************\n");
printf("******************************************************\n");
printf("Result Matrix:\n");
for (i=0; i<NRA; i++)
{
for (j=0; j<NCB; j++)
printf("%6.2f ", b[i][j]);
printf("\n");
}
printf("******************************************************\n");
printf("******************************************************\n");
printf("Result Matrix:\n");
for (i=0; i<NRA; i++)
{
for (j=0; j<NCB; j++)
printf("%6.2f ", c[i][j]);
printf("\n");
}
printf("******************************************************\n");
printf ("Done.\n");
}
对于Serial版本,我刚刚评论了该行:
#pragma omp for schedule (static, chunk)
我的并行版本的输出是:
起始矩阵多个例子,12个线程初始化矩阵...线程0起始矩阵乘法...线程8起始矩阵乘法...线程6起始矩阵乘法...线程9起始矩阵乘法...线程5起始矩阵乘法。 ..线程1起始矩阵乘法...线程4起始矩阵乘法...线程7起始矩阵乘法...线程10起始矩阵乘法...线程3起始矩阵乘法...线程2起始矩阵乘法...线程= 0做行= 0线程= 0做行= 1线程= 0做行= 2线程11开始矩阵乘法... ********************* *********************************结果矩阵:0.00 1.00 2.00 1.00 2.00 3.00 2.00 3.00 4.00
************************************************** ****结果矩阵:0.00 0.00 0.00 0.00 1.00 2.00 0.00 2.00 4.00
************************************************** ****结果矩阵:0.00 5.00 10.00 0.00 8.00 16.00 0.00 11.00 22.00 ************************************************** ****完成。
我的串行版本的输出是这样的:
起始矩阵多个例子,12个线程初始化矩阵...线程0起始矩阵乘法...线程3起始矩阵乘法...线程5起始矩阵乘法...线程11起始矩阵乘法...线程1起始矩阵乘法。 ..线程10起始矩阵乘法...线程2起始矩阵乘法...线程9起始矩阵乘法...线程7起始矩阵乘法...线程8起始矩阵乘法...线程4起始矩阵乘法...线程6起始矩阵乘以... ****************************************** ************结果矩阵:0.00 1.00 2.00 1.00 2.00 3.00 2.00 3.00 4.00
************************************************** ****结果矩阵:0.00 0.00 0.00 0.00 1.00 2.00 0.00 2.00 4.00
************************************************** ****结果矩阵:0.00 60.00 120.00 0.00 96.00 192.00 0.00 132.00 264.00 ************************************************** ****完成。
我该如何处理这个问题?
我发现了错误。在Serial版本中,我没有正确评论。我忽略了这条线:
#pragma omp parallel shared(a,b,c,nthreads,chunk) private(tid,i,j,k)