我有下面的代码,我试图增加称为SL的结构。在下面的情况下,我如何以原子方式递增相同的值?如何避免竞争条件?我不关心在这种情况下实现的并行化。
__global__ void insertKernel(struct SlabList* head_ref, int* new_key, int* new_val, int size,struct SlabList* SL, struct SlabList* temp){
int id = blockIdx.x*blockDim.x + threadIdx.x;
if(id<size/SLAB_SIZE){
head_ref=NULL;
struct SlabList* new_node = (struct SlabList*)
malloc(sizeof(struct SlabList));
for(int j=0;j<SLAB_SIZE;j++){
new_node->key[j] = new_key[id*SLAB_SIZE+j];
new_node->val[j]= new_val[id*SLAB_SIZE+j];
}
new_node->next = head_ref;
memcpy(SL,new_node, size * sizeof(struct SlabList));
head_ref = new_node;
SL++;//How to perform this atomically?
}
我查看了CUDA的atomicInc
和atomicAdd
API,但由于采用了不同的参数,因此无法进行。
通过我的计算,有两个操作只有在原子上执行时才能正常工作(不改变代码的结构) - 你突出显示的SL
的增量,以及当树扩展时交换head_ref
指针值。
如果(并且仅当),您使用的是64位操作系统,那么这样的东西可能会起作用:
__global__ void insertKernel(struct SlabList* head_ref, int* new_key,
int* new_val, int size, struct SlabList* SL, struct SlabList* temp)
{
int id = blockIdx.x*blockDim.x + threadIdx.x;
if(id<size/SLAB_SIZE){
struct SlabList* new_node = (struct SlabList*)malloc(sizeof(struct SlabList));
SlabList* SLnew = (SlabList *)atomicAdd((unsigned long long *)&SL,
sizeof(struct SlabList));
SlabList* oldhead = (SlabList *)atomicExch((unsigned long long *)&head_ref,
(unsigned long long)new_node);
for(int j=0;j<SLAB_SIZE;j++){
new_node->key[j] = new_key[id*SLAB_SIZE+j];
new_node->val[j] = new_val[id*SLAB_SIZE+j];
}
new_node->next = oldhead;
memcpy(SLnew, new_node, sizeof(struct SlabList));
}
}
[注意:从未编译或运行,更不用说测试了。自担风险]