如何使用 scons 使用 openmp 正确编译 Fortran 程序？

Question

免责声明：我是 openmp 的新手。最初，我以为我在 openmp 中做错了（可能仍然是这种情况），但现在问题似乎与我如何使用 openmp 进行编译有关，因为无论我在做什么，我都在一个线程上运行。

我有以下子程序，这是我在更大的 DEM 程序中的步进函数。它计算粒子

和

之间的每一次相互作用，然后更新力。在我尝试使用更高级的搜索算法之前，我现在首先尝试并行化这个复杂的 O(N^2) 循环。

但是，我找不到让它工作的方法，意思是，无论我在做什么，都只有一个线程在运行。

另外，请注意，我对我的变量使用了 include，因为我有很多变量需要被许多子例程访问和修改，而且感觉这是完成这项工作的简单方法，我愿意接受建议。

这是我的子程序：

subroutine stepper (tstep)

    use omp_lib

    implicit none

    include "parameter.h"
    include "CB_variables.h"
    include "CB_const.h"
    include "CB_bond.h"
    include "CB_forcings.h"

    integer :: i, j
    integer, intent(in) :: tstep

    ! reinitialize force arrays for contact and bonds
    do i = 1, n
        mc(i)  = 0d0
        mb(i)  = 0d0
        fcx(i) = 0d0
        fcy(i) = 0d0
        fbx(i) = 0d0
        fby(i) = 0d0
    end do
    
    ! put yourself in the referential of the ith particle
    ! loop through all j particles and compute interactions

    !$omp parallel do schedule(dynamic) &
    !$omp private(i,j) &
    !$omp reduction(+:tfx,tfy,fcx,fcy,fbx,fby,m,mc,mb)
    do i = 1, n
        do j = i + 1, n

            ! compute relative position and velocity
            call rel_pos_vel (i, j)

            ! bond initialization
            if ( tstep .eq. 1 ) then
                if ( -deltan(i, j) .le. 5d-1 * r(i)) then ! can be fancier
                    !bond (i, j) = 1
                end if
                call bond_properties (i, j)
            end if

            ! verify if two particles are colliding
            if ( deltan(i,j) .gt. 0 ) then

                call contact_forces (i, j)
                !call bond_creation (i, j) ! to implement

                ! update contact force on particle i by particle j
                fcx(i) = fcx(i) - fcn(i,j) * cosa(i,j)
                fcy(i) = fcy(i) - fcn(i,j) * sina(i,j)

                ! update moment on particule i by particule j due to tangent contact 
                mc(i) = mc(i) - r(i) * fct(i,j) - mcc(i,j)

                ! Newton's third law
                ! update contact force on particle j by particle i
                fcx(j) = fcx(j) + fcn(i,j) * cosa(i,j)
                fcy(j) = fcy(j) + fcn(i,j) * sina(i,j)

                ! update moment on particule j by particule i due to tangent contact 
                mc(j) = mc(j) - r(j) * fct(i,j) + mcc(i,j)

            end if

            ! compute forces from bonds between particle i and j
            if ( bond (i, j) .eq. 1 ) then

                call bond_forces (i, j)
                !call bond_breaking (i, j)

                ! update force on particle i by particle j due to bond
                fbx(i) = fbx(i) - fbn(i,j) * cosa(i,j) +    &
                                    fbt(i,j) * sina(i,j)
                fby(i) = fby(i) - fbn(i,j) * sina(i,j) -    &
                                    fbt(i,j) * cosa(i,j)

                ! update moment on particule i by particule j to to bond
                mb(i) = mb(i) - r(i) * fbt(i,j) - mbb(i, j)

                ! Newton's third law
                ! update force on particle j by particle i due to bond
                fbx(j) = fbx(j) + fbn(i,j) * cosa(i,j) -    &
                                    fbt(i,j) * sina(i,j)
                fby(j) = fby(j) + fbn(i,j) * sina(i,j) +    &
                                    fbt(i,j) * cosa(i,j)
 
                
                ! update moment on particule j by particule i to to bond
                mb(j) = mb(j) - r(i) * fbt(i,j) + mbb(j, i)

            end if

            ! compute sheltering height for particule j on particle i for air and water drag
            call sheltering(i, j)

        end do

        ! compute the total forcing from winds, currents and coriolis on particule i
        call forcing (i)
        !call coriolis(i)

        ! reinitialize total force arrays before summing everything
        m(i)   = 0d0
        tfx(i) = 0d0
        tfy(i) = 0d0

        ! sum all forces together on particule i
        tfx(i) = fcx(i) + fbx(i) + fax(i) + fwx(i)
        tfy(i) = fcy(i) + fby(i) + fay(i) + fwy(i)

        ! sum all moments on particule i together
        m(i) =  mc(i) + mb(i) + ma(i) + mw(i)
    end do
    !$omp end parallel do

    ! forces on side particles for experiments
    call experiment_forces

    ! integration in time
    call velocity
    call euler

end subroutine stepper

```
rel_pos_vel(i, j)
```
计算粒子
```
i
```
和
```
j
```
之间的相对位置、速度、角度等。里面的所有东西都是来自普通块的二维数组。
```
bond_properties(i, j)
```
这里也一样
```
contact_forces(i, j)
```
一些局部变量以及来自公共块的一些数组
```
bond_forces(i, j)
```
同样的事情
```
sheltering
```
和
```
forcing
```
一样

实际上，当打印并行区域中的线程数时（使用

omp_get_num_thread()

），输出为1，表示没有发生并行化。使用

omp_in_parallel()

时会发生同样的事情，返回

false

.

此外，当尝试将第一个初始化循环并行化作为理解的第一步时：

!$omp do private(i) num_threads(10)
print*, omp_get_num_threads()
do i = 1, n
    mc(i)  = 0d0
    mb(i)  = 0d0
    fcx(i) = 0d0
    fcy(i) = 0d0
    fbx(i) = 0d0
    fby(i) = 0d0
end do
!$omp parallel do

仍然不是并行，

omp_get_num_threads

的输出是1。并且明确给定线程数是10。

我正在使用 gfortran 在 scons 中编译：我的文件如下所示：

from glob import glob

# ----------------------------------
#  Standard Compilation Environment
# ----------------------------------

include = Dir("#/include")
libs = Dir("#/libs")
FC = "gfortran"

env = Environment(
    LIBPATH=[libs],
    FORTRANMODDIR=[include],
    FORTRANMODDIRPREFIX="-J",
    FORTRANPATH=[include],
    FORTRANFLAGS=[
        "-Wall",
        "-Wno-tabs",
        "-fPIC",
        "-fbackslash",
        "-O3",
        "-ffast-math",
        "-mcmodel=medium",
        "-fopenmp",
    ],
    LINKFLAGS=["-fopenmp"],
    FORTRAN=FC,
)

# Command line options to modify the build environment.
# Use scons-3 debug=1 to compile with debugging flags.


DEBUG = ARGUMENTS.get("debug", 0)
EXE = ARGUMENTS.get("exe", "godar")

if int(DEBUG):
    env.Append(FORTRANFLAGS="-g")
    env.Append(FORTRANFLAGS="-static")
    env.Append(FORTRANFLAGS="-fbacktrace")
    env.Append(LINKFLAGS="-fbacktrace")

# ----------------------------------------
#             Create Executables
# ----------------------------------------

# Export the compilation environment so SConscript files in subdirectories can access it.

Export("env")
Export("EXE")

# The datetime module
datetime = SConscript("datetime/SConscript", variant_dir="build/datetime")

# The GoDAR library and executable
godar = SConscript("src/SConscript", variant_dir="build/godar")

Answer 1

正如我在评论中所写：在

do 循环结束之前，您正在更新

tfx

和

tfy

数组，但仅使用

索引。因此，所有用

指数计算出来的东西都不在这里。所以你应该在 j do 循环和 j 索引中更新它，或者在并行循环之后的单独循环中更新它。

这里是演示代码：

program foo
implicit none

integer, parameter :: N = 40
integer :: s(N), t(N), i, j

s(:) = 0
t(:) = 0
!$ call omp_set_num_threads(2)
!$OMP PARALLEL DO PRIVATE(j) REDUCTION(+:s,t) SCHEDULE(dynamic,1)
do i = 1, N
    do j = i+1, N
        s(i) = s(i)+1
        s(j) = s(j)+1
    end do
    t(i) = s(i)
end do
!$END END PARALLEL DO

write(*,*) "s array:"
write(*,"(10I5)") s(:)
write(*,*) "t array:"
write(*,"(10I5)") t(:)
end program

如果我在没有 OpenMP 的情况下运行它（或可选地使用单个线程），则输出符合预期：

s array:
   39   39   39   39   39   39   39   39   39   39
   39   39   39   39   39   39   39   39   39   39
   39   39   39   39   39   39   39   39   39   39
   39   39   39   39   39   39   39   39   39   39
t array:
   39   39   39   39   39   39   39   39   39   39
   39   39   39   39   39   39   39   39   39   39
   39   39   39   39   39   39   39   39   39   39
   39   39   39   39   39   39   39   39   39   39```

现在，如果我用 OpenMP 运行它，我会在 t(:)::

中得到垃圾

 s array:
   39   39   39   39   39   39   39   39   39   39
   39   39   39   39   39   39   39   39   39   39
   39   39   39   39   39   39   39   39   39   39
   39   39   39   39   39   39   39   39   39   39
 t array:
   39   38   38   37   37   36   36   35   35   34
   34   33   33   32   32   31   31   30   30   29
   29   28   28   27   27   26   26   25   25   24
   24   23   23   22   22   21   21   20   20   19

解决方案显然是仅在并行循环后更新

：

program foo
implicit none

integer, parameter :: N = 40
integer :: s(N), t(N), i, j

s(:) = 0

!$ call omp_set_num_threads(2)
!$OMP PARALLEL DO PRIVATE(j) REDUCTION(+:s) SCHEDULE(dynamic,1)
do i = 1, N
    do j = i+1, N
        s(i) = s(i)+1
        s(j) = s(j)+1
    end do
end do
!$END END PARALLEL DO

t(:) = s(:)

write(*,*) "s array:"
write(*,"(10I5)") s(:)
write(*,*) "t array:"
write(*,"(10I5)") t(:)
end program

说明：

在上面的示例中（使用 OpenMP 和减少

），我们假设线程 0 处理奇数

迭代，线程 1 处理偶数

迭代。由于减少，

和

在 OpenMP 区域中是隐式私有的：让我们将

s0

、

t0

、

s1

、

t1

表示为私有版本。

在线程 0 中，iter 1，在
```
j
```
上的循环结束时：
- ```
s0(:)=[N-1,1,1,...,1]
```
- ```
t0(:)=[N-1,0,0,...,0]
```
在线程 1 中，iter 2，在
```
j
```
上的循环结束时：
- ```
s1(:)=[0,N-2,1,...,1]
```
- ```
t1(:)=[0,N-2,0,...,0]
```

在线程 0 中，iter 3，在

上的循环结束时：

```
s0(:)=[N-1,1,N-2,2,2,...,2]
```
```
t0(:)=[N-1,0,N-2,0,0,...,0]
```

在线程 1 中，iter 4，在

上的循环结束时：

```
s1(:)=[0,N-2,1,N-3,1,...,1]
```
```
t1(:)=[0,N-2,0,N-3,0,...,0]
```

...

在线程 0 中，iter N-1，在

上的循环结束时：

s0(:)=[N-1,1,N-2,2,N-3,3,N-4,4,...,N/2-1,N/2]

t0(:)=[N-1,0,N-2,0,N-3,0,N-4,0,...,N/2-1,0]

在线程 1 中，iter N，在

上的循环结束时：

```
s1(:)=[0,N-2,1,N-3,1,N-4,...,N/2,N/2-1]
```
```
t1(:)=[0,N-2,0,N-3,0,N-4,...,0  ,N/2]
```

在平行区域的末端，缩减包括

s=s0+s1

和

t=t0+t1

。你可以看到它对

很顺利，但对

就不行了

如何使用 scons 使用 openmp 正确编译 Fortran 程序？

问题描述投票：0回答：1

1个回答

说明：

最新问题

如何使用 scons 使用 openmp 正确编译 Fortran 程序？

问题描述 投票：0回答：1

1个回答

说明：

最新问题

问题描述投票：0回答：1