Numba cuda：使用共享内存添加数字会导致覆盖

Question

我一直在尝试使用共享内存添加数字，因此如下：

线程0：将1添加到共享内存变量sharedMemT [0]

线程1：将1添加到共享内存变量sharedMemT [0]

同步线程并将sharedMemT [0]存储到输出[0]

但是结果是... 1 ??

@cuda.jit()
def add(output):
    sharedMemT = cuda.shared.array(shape=(1), dtype=int32)
    sharedMemT[0] = 0
    cuda.syncthreads()

    sharedMemT[0] += 1
    cuda.syncthreads()
    output[0] = sharedMemT[0]

out = np.array([0])
add[1, 2](out)
print(out) # results in [1]

Answer 1

恭喜，您有一场记忆竞赛。线程0和1同时运行，因此结果在共享内存变量的操作和回写全局内存中都是不确定的。

为了使其正常工作，您需要使用原子内存操作来序列化对共享内存变量的访问，然后只有一个线程写回全局内存：

$ cat atomic.py

import numpy as np
from numba import cuda, int32

@cuda.jit()
def add(output):
    sharedMemT = cuda.shared.array(shape=(1), dtype=int32)
    pos = cuda.grid(1)
    if pos == 0:
        sharedMemT[0] = 0

    cuda.syncthreads()

    cuda.atomic.add(sharedMemT, 0, 1)
    cuda.syncthreads()

    if pos == 0:
        output[0] = sharedMemT[0]

out = np.array([0])
add[1, 2](out)
print(out)

$ python atomic.py
[2]

Numba cuda：使用共享内存添加数字会导致覆盖

问题描述投票：0回答：1

1个回答

最新问题

Numba cuda：使用共享内存添加数字会导致覆盖

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1