免责声明:我是 openmp 的新手。最初,我以为我在 openmp 中做错了(可能仍然是这种情况),但现在问题似乎与我如何使用 openmp 进行编译有关,因为无论我在做什么,我都在一个线程上运行。
我有以下子程序,这是我在更大的 DEM 程序中的步进函数。它计算粒子
i
和 j
之间的每一次相互作用,然后更新力。在我尝试使用更高级的搜索算法之前,我现在首先尝试并行化这个复杂的 O(N^2) 循环。
但是,我找不到让它工作的方法,意思是,无论我在做什么,都只有一个线程在运行。
另外,请注意,我对我的变量使用了 include,因为我有很多变量需要被许多子例程访问和修改,而且感觉这是完成这项工作的简单方法,我愿意接受建议。
这是我的子程序:
subroutine stepper (tstep)
use omp_lib
implicit none
include "parameter.h"
include "CB_variables.h"
include "CB_const.h"
include "CB_bond.h"
include "CB_forcings.h"
integer :: i, j
integer, intent(in) :: tstep
! reinitialize force arrays for contact and bonds
do i = 1, n
mc(i) = 0d0
mb(i) = 0d0
fcx(i) = 0d0
fcy(i) = 0d0
fbx(i) = 0d0
fby(i) = 0d0
end do
! put yourself in the referential of the ith particle
! loop through all j particles and compute interactions
!$omp parallel do schedule(dynamic) &
!$omp private(i,j) &
!$omp reduction(+:tfx,tfy,fcx,fcy,fbx,fby,m,mc,mb)
do i = 1, n
do j = i + 1, n
! compute relative position and velocity
call rel_pos_vel (i, j)
! bond initialization
if ( tstep .eq. 1 ) then
if ( -deltan(i, j) .le. 5d-1 * r(i)) then ! can be fancier
!bond (i, j) = 1
end if
call bond_properties (i, j)
end if
! verify if two particles are colliding
if ( deltan(i,j) .gt. 0 ) then
call contact_forces (i, j)
!call bond_creation (i, j) ! to implement
! update contact force on particle i by particle j
fcx(i) = fcx(i) - fcn(i,j) * cosa(i,j)
fcy(i) = fcy(i) - fcn(i,j) * sina(i,j)
! update moment on particule i by particule j due to tangent contact
mc(i) = mc(i) - r(i) * fct(i,j) - mcc(i,j)
! Newton's third law
! update contact force on particle j by particle i
fcx(j) = fcx(j) + fcn(i,j) * cosa(i,j)
fcy(j) = fcy(j) + fcn(i,j) * sina(i,j)
! update moment on particule j by particule i due to tangent contact
mc(j) = mc(j) - r(j) * fct(i,j) + mcc(i,j)
end if
! compute forces from bonds between particle i and j
if ( bond (i, j) .eq. 1 ) then
call bond_forces (i, j)
!call bond_breaking (i, j)
! update force on particle i by particle j due to bond
fbx(i) = fbx(i) - fbn(i,j) * cosa(i,j) + &
fbt(i,j) * sina(i,j)
fby(i) = fby(i) - fbn(i,j) * sina(i,j) - &
fbt(i,j) * cosa(i,j)
! update moment on particule i by particule j to to bond
mb(i) = mb(i) - r(i) * fbt(i,j) - mbb(i, j)
! Newton's third law
! update force on particle j by particle i due to bond
fbx(j) = fbx(j) + fbn(i,j) * cosa(i,j) - &
fbt(i,j) * sina(i,j)
fby(j) = fby(j) + fbn(i,j) * sina(i,j) + &
fbt(i,j) * cosa(i,j)
! update moment on particule j by particule i to to bond
mb(j) = mb(j) - r(i) * fbt(i,j) + mbb(j, i)
end if
! compute sheltering height for particule j on particle i for air and water drag
call sheltering(i, j)
end do
! compute the total forcing from winds, currents and coriolis on particule i
call forcing (i)
!call coriolis(i)
! reinitialize total force arrays before summing everything
m(i) = 0d0
tfx(i) = 0d0
tfy(i) = 0d0
! sum all forces together on particule i
tfx(i) = fcx(i) + fbx(i) + fax(i) + fwx(i)
tfy(i) = fcy(i) + fby(i) + fay(i) + fwy(i)
! sum all moments on particule i together
m(i) = mc(i) + mb(i) + ma(i) + mw(i)
end do
!$omp end parallel do
! forces on side particles for experiments
call experiment_forces
! integration in time
call velocity
call euler
end subroutine stepper
rel_pos_vel(i, j)
计算粒子i
和j
之间的相对位置、速度、角度等。里面的所有东西都是来自普通块的二维数组。
bond_properties(i, j)
这里也一样
contact_forces(i, j)
一些局部变量以及来自公共块的一些数组
bond_forces(i, j)
同样的事情
sheltering
和forcing
一样
实际上,当打印并行区域中的线程数时(使用
omp_get_num_thread()
),输出为1,表示没有发生并行化。使用 omp_in_parallel()
时会发生同样的事情,返回 false
.
此外,当尝试将第一个初始化循环并行化作为理解的第一步时:
!$omp do private(i) num_threads(10)
print*, omp_get_num_threads()
do i = 1, n
mc(i) = 0d0
mb(i) = 0d0
fcx(i) = 0d0
fcy(i) = 0d0
fbx(i) = 0d0
fby(i) = 0d0
end do
!$omp parallel do
仍然不是并行,
omp_get_num_threads
的输出是1。并且明确给定线程数是10。
我正在使用 gfortran 在 scons 中编译:我的文件如下所示:
from glob import glob
# ----------------------------------
# Standard Compilation Environment
# ----------------------------------
include = Dir("#/include")
libs = Dir("#/libs")
FC = "gfortran"
env = Environment(
LIBPATH=[libs],
FORTRANMODDIR=[include],
FORTRANMODDIRPREFIX="-J",
FORTRANPATH=[include],
FORTRANFLAGS=[
"-Wall",
"-Wno-tabs",
"-fPIC",
"-fbackslash",
"-O3",
"-ffast-math",
"-mcmodel=medium",
"-fopenmp",
],
LINKFLAGS=["-fopenmp"],
FORTRAN=FC,
)
# Command line options to modify the build environment.
# Use scons-3 debug=1 to compile with debugging flags.
DEBUG = ARGUMENTS.get("debug", 0)
EXE = ARGUMENTS.get("exe", "godar")
if int(DEBUG):
env.Append(FORTRANFLAGS="-g")
env.Append(FORTRANFLAGS="-static")
env.Append(FORTRANFLAGS="-fbacktrace")
env.Append(LINKFLAGS="-fbacktrace")
# ----------------------------------------
# Create Executables
# ----------------------------------------
# Export the compilation environment so SConscript files in subdirectories can access it.
Export("env")
Export("EXE")
# The datetime module
datetime = SConscript("datetime/SConscript", variant_dir="build/datetime")
# The GoDAR library and executable
godar = SConscript("src/SConscript", variant_dir="build/godar")
正如我在评论中所写:在
i
do 循环结束之前,您正在更新 tfx
和 tfy
数组,但仅使用 i
索引。因此,所有用 j
指数计算出来的东西都不在这里。所以你应该在 j do 循环和 j 索引中更新它,或者在并行循环之后的单独循环中更新它。
这里是演示代码:
program foo
implicit none
integer, parameter :: N = 40
integer :: s(N), t(N), i, j
s(:) = 0
t(:) = 0
!$ call omp_set_num_threads(2)
!$OMP PARALLEL DO PRIVATE(j) REDUCTION(+:s,t) SCHEDULE(dynamic,1)
do i = 1, N
do j = i+1, N
s(i) = s(i)+1
s(j) = s(j)+1
end do
t(i) = s(i)
end do
!$END END PARALLEL DO
write(*,*) "s array:"
write(*,"(10I5)") s(:)
write(*,*) "t array:"
write(*,"(10I5)") t(:)
end program
如果我在没有 OpenMP 的情况下运行它(或可选地使用单个线程),则输出符合预期:
s array:
39 39 39 39 39 39 39 39 39 39
39 39 39 39 39 39 39 39 39 39
39 39 39 39 39 39 39 39 39 39
39 39 39 39 39 39 39 39 39 39
t array:
39 39 39 39 39 39 39 39 39 39
39 39 39 39 39 39 39 39 39 39
39 39 39 39 39 39 39 39 39 39
39 39 39 39 39 39 39 39 39 39```
现在,如果我用 OpenMP 运行它,我会在 t(:)::
中得到垃圾 s array:
39 39 39 39 39 39 39 39 39 39
39 39 39 39 39 39 39 39 39 39
39 39 39 39 39 39 39 39 39 39
39 39 39 39 39 39 39 39 39 39
t array:
39 38 38 37 37 36 36 35 35 34
34 33 33 32 32 31 31 30 30 29
29 28 28 27 27 26 26 25 25 24
24 23 23 22 22 21 21 20 20 19
解决方案显然是仅在并行循环后更新
t
:
program foo
implicit none
integer, parameter :: N = 40
integer :: s(N), t(N), i, j
s(:) = 0
!$ call omp_set_num_threads(2)
!$OMP PARALLEL DO PRIVATE(j) REDUCTION(+:s) SCHEDULE(dynamic,1)
do i = 1, N
do j = i+1, N
s(i) = s(i)+1
s(j) = s(j)+1
end do
end do
!$END END PARALLEL DO
t(:) = s(:)
write(*,*) "s array:"
write(*,"(10I5)") s(:)
write(*,*) "t array:"
write(*,"(10I5)") t(:)
end program
在上面的示例中(使用 OpenMP 和减少
t
),我们假设线程 0 处理奇数 i
迭代,线程 1 处理偶数 i
迭代。由于减少,s
和 t
在 OpenMP 区域中是隐式私有的:让我们将 s0
、t0
、s1
、t1
表示为私有版本。
j
上的循环结束时:
s0(:)=[N-1,1,1,...,1]
t0(:)=[N-1,0,0,...,0]
j
上的循环结束时:
s1(:)=[0,N-2,1,...,1]
t1(:)=[0,N-2,0,...,0]
j
上的循环结束时:
s0(:)=[N-1,1,N-2,2,2,...,2]
t0(:)=[N-1,0,N-2,0,0,...,0]
j
上的循环结束时:
s1(:)=[0,N-2,1,N-3,1,...,1]
t1(:)=[0,N-2,0,N-3,0,...,0]
...
j
上的循环结束时:
s0(:)=[N-1,1,N-2,2,N-3,3,N-4,4,...,N/2-1,N/2]
t0(:)=[N-1,0,N-2,0,N-3,0,N-4,0,...,N/2-1,0]
j
上的循环结束时:
s1(:)=[0,N-2,1,N-3,1,N-4,...,N/2,N/2-1]
t1(:)=[0,N-2,0,N-3,0,N-4,...,0 ,N/2]
在平行区域的末端,缩减包括
s=s0+s1
和t=t0+t1
。你可以看到它对s
很顺利,但对t
就不行了