将for循环转换为并行计算

问题描述 投票:1回答:1

我进行数值模拟。我有50,000个在一个盒子中结晶的粒子,其中有几个晶体(具有周期性边界条件)。我试图找到盒子里最大的水晶。该代码运行良好,但是在我的计算机上大约需要100 [s]。速率确定步骤是下面的函数find_neighbors()

是否有使用并行计算的方法来缩短计算时间?

我的计算机有12个核心。也许我们可以将阵列(中心)分成12个部分?

我的函数的输出是二维列表。

这里是find_neighbors()函数,编写时没有并行计算:

# Calculate if two particles are neighbors (Periodic Boundary Condition)
#              center is the coordinates of the particles
#              d      is the distance threshold for two clusters to be neighbors
#              Maxima_Range and
#              Lattice_Range are for periodic boundary conditions
#              Pos_3d is a list.
# Suppose particle i has neighbors j,k,l.
#              Pos_3d[i] = [j,k,l]

def find_neighbors(center,d,Maxima_Range,Lattice_Range):
    a,b = np.shape(center)          # get the dimension of the array
    flag_3d = np.zeros(a)           # count the number of neighbors for particle i
    Pos_3d = [[] for i in range(a)] # Pos_3d is a two dimmensional list. 
    #                               # Pos_3d[i] includes the indices of all the neighbors of particle i

    diff_sqrt = d ** 0.5
    for i in range(a):
        if i % 6 == 5:              # for some reason, when i % 6 == 5,
            #                       #     it is the center coordinates,
            #                       #     this cannot be changed.
            for j in range(i+1,a):
                if j % 6 == 5:

                    diff_x = center[i][0]-center[j][0]
                    diff_y = center[i][1]-center[j][1]
                    diff_z = center[i][2]-center[j][2]

                    if (   diff_x <  diff_sqrt
                       and diff_y <  diff_sqrt
                       and diff_z <  diff_sqrt
                       and diff_x > -diff_sqrt
                       and diff_y > -diff_sqrt
                       and diff_z > -diff_sqrt ):
                           #        # if one of the x,y,z component is already
                           #        #    bigger than d, we don't need
                           #        #    to calculate their distance in 3-D.
                           #        #    They are not neighbors.

                        diff = ( diff_x * diff_x
                               + diff_y * diff_y
                               + diff_z * diff_z
                                 )

                        if diff <= d: # without period boundary condition
                            flag_3d[i] +=1
                            flag_3d[j] +=1
                            Pos_3d[i].append(j)
                            Pos_3d[j].append(i)
                            continue


                        # With periodic boundary condition
                        # in x

                        do something..

    return Pos_3d
python performance parallel-processing processing-efficiency parallelism-amdahl
1个回答
0
投票

Q“是否有[[一种使用并行计算的方法来缩短计算时间?“

是,有。

但是,在任何人开始尝试之前,请修复您的原始代码,以避免任何和所有开销:

最好避免使用列表,但最好附加numpy可矢量化和高效处理的数据结构:

aVectorOfDIFFs = center[i] - center[j] # aVectorOfDIFFs.shape ( 3, ) # _x, _y, _z if d >= np.dot( aVectorOfDIFFs, # diff == ( ( _x * _x ) aVectorOfDIFFs.T # + ( _y * _y ) ): # + ( _z * _z ) ) # makes sense to do anything ... with ( i, j ) ...

除了对代码进行智能的自上而下的重构到性能最佳的

numpy

-向量化处理(如果对最终性能感兴趣,我们可以稍后再讨论),只是简单的串联原始的[[for-loops效率很低,大部分时间都花在了其中,python多年来一直对此有所了解,缓慢的循环迭代。它可以按原样运行~ 20x更快:

>>> from zmq import Stopwatch; aClk = Stopwatch() >>> def aPairOfNaiveFORs( a = 1E6 ): ... intA = int( a ) # a gets fused into (int) ... for i in range( intA ):# original, keep looping all i-s ... if i % 6 == 5: # if i-th "center": ... for j in range( i+1, intA ):# keep looping all j-s ... if j % 6 == 5: # if j-th "center": ... # NOP, # do some usefull work ... # not important here # ... # for the testing, the overhead comparison matters ... continue # loop next ... return >>> aClk.start(); _ = aPairOfNaiveFORs( 1E3 ); aClk.stop() 14770 [us] for 1E3 ~ 13+ [us] per one item of a[:1000] 14248 13430 13252 >>> aClk.start(); _ = aPairOfNaiveFORs( 1E5 ); aClk.stop() 64559866 [us] for 1E5 ~ 645+ [us] per one item of a[:10000]
现在,避免所有不符合5 == <index> modulo 6的荒谬循环。>
>>> def aPairOfLessNaiveFORs( a = 1E6 ): ... intA = int( a ) ... for i in range ( 5, intA, 6 ): # yes, start with the 5-th item of the first matching "center" ... for j in range( i+6, intA, 6 ): # yes, keep matching ( modulo 6 ) steps ... continue

测试显示

~ +13x ~ +17x
更快的代码(避免5/6循环):

>>> aClk.start(); _ = aPairOfLessNaiveFORs( 1E3 ); aClk.stop() 1171 [us] for 1E3 ~ 1 [us] per one item of a[:1000] 976 1157 1181 1185 >>> aClk.start(); _ = aPairOfLessNaiveFORs( 1E5 ); aClk.stop() 3725167 [us] for 1E5 ~ 37 [us] per one item of a[:10000]

RESUME:

只是避免主开销的这种精确设计将缩短在大约5E5结晶中心(即3E6坐标)左右缩放的处理过程的循环。

>>> aClk.start(); _ = aPairOfLessNaiveFORs( 5E5 ); aClk.stop() 100104334 [us] for 5E5 ~ 17.6 x FASTER >>> aClk.start(); _ = aPairOfNaiveFORs( 5E5 ); aClk.stop() 1759420300 [us] for 5E5 ~ ^^^^^^^^^^^^^

甚至是it easily need not get any "positive"-boost进入并行计算模型(记住corrected Amdahl's Law,这样做的附加开销也不会被忽略或忽略)。

© www.soinside.com 2019 - 2024. All rights reserved.