Julia中的线程性能

问题描述 投票:1回答:1

我尝试并行执行Julia代码不会随着线程数的增加而提高性能。

无论我将JULIA_NUM_THREADS设置为2还是32,下面的代码都会在大约同一时间运行。

using Random
using Base.Threads

rmax = 10
dr = 1
Ngal = 100000000

function bin(id, Njobs, x, y, z, w)
    bin_array = zeros(10)
    for i in (id-1)*Njobs + 1:id*Njobs
        r = sqrt(x[i]^2 + y[i]^2 + z[i]^2)
        i_bin = floor(Int, r/dr) + 1
        if i_bin < 10
            bin_array[i_bin] += w[i]
        end
    end
    bin_array
end

Nthreads = nthreads()

x = rand(Ngal)*5
y = rand(Ngal)*5
z = rand(Ngal)*5
w = ones(Ngal)

V = let
    VV = [zeros(10) for _ in 1:Nthreads]
    jobs_per_thread = fill(div(Ngal, Nthreads),Nthreads)
    for i in 1:Ngal-sum(jobs_per_thread)
        jobs_per_thread[i] += 1
    end
    @threads for i = 1:Nthreads
        tid = threadid()
        VV[tid] = bin(tid, jobs_per_thread[tid], x, y, z, w)
    end
    reduce(+, VV)
end

我做错什么了吗?

julia
1个回答
0
投票

与其余操作相比,在线程循环中花费的时间可以忽略不计。您还将根据线程数分配大小,因此,当使用多个线程时,您甚至会(稍微)花费更多时间在内存分配上。


如果您关心性能,请查看https://docs.julialang.org/en/v1/manual/performance-tips/。特别是,要不惜一切代价避免使用全局变量(它们会降低性能),并将所有内容都放入函数中,这些变量也更易于测试和调试。例如,我将您的代码重写为:

using Random
using Base.Threads

function bin(id, Njobs, x, y, z, w)
    dr = 1

    bin_array = zeros(10)
    for i in (id-1)*Njobs + 1:id*Njobs
        r = sqrt(x[i]^2 + y[i]^2 + z[i]^2)
        i_bin = floor(Int, r/dr) + 1
        if i_bin < 10
            bin_array[i_bin] += w[i]
        end
    end
    bin_array
end

function test()
    Ngal = 100000000
    x = rand(Ngal)*5
    y = rand(Ngal)*5
    z = rand(Ngal)*5
    w = ones(Ngal)

    Nthreads = nthreads()
    VV = [zeros(10) for _ in 1:Nthreads]
    jobs_per_thread = fill(div(Ngal, Nthreads),Nthreads)
    for i in 1:Ngal-sum(jobs_per_thread)
        jobs_per_thread[i] += 1
    end
    @threads for i = 1:Nthreads
        tid = threadid()
        VV[tid] = bin(tid, jobs_per_thread[tid], x, y, z, w)
    end
    reduce(+, VV)
end

test()

一个线程的性能:

julia> @time test();
  3.054144 seconds (33 allocations: 5.215 GiB, 11.03% gc time)

具有4个线程的性能:

julia> @time test();
  2.602698 seconds (65 allocations: 5.215 GiB, 9.92% gc time)

如果我注释for中的test()循环,则会得到以下计时。一个线程:

julia> @time test();
  2.444296 seconds (21 allocations: 5.215 GiB, 10.54% gc time)

4个线程:

julia> @time test();
  2.481054 seconds (27 allocations: 5.215 GiB, 12.08% gc time)
© www.soinside.com 2019 - 2024. All rights reserved.