CUDA全局函数对于某些索引未正确添加数组Vlaues

问题描述 投票:0回答:1

我正在YouTube上使用此CUDA video tutorial。视频的后半部分提供了代码。这是一个简单的CUDA程序,用于添加两个数组的元素。因此,如果我们有一个名为a的第一个数组和一个名为b的第二个数组,则a[i]的最终值为:

a[i] += b[i];

问题是,无论我做什么。最终输出的前四个元素始终是奇数。程序为0到1000的数组创建随机输入。这意味着每个索引的最终输出值应在0到2000之间。但是,不管随机种子是什么,程序总是输出一个非常大的组合(范围)前四个结果的数字或零。

对于大于3的索引,似乎可以找到输出。这是我的代码:

#include <iostream>
#include <cuda.h>
#include <stdlib.h>
#include <ctime>

using namespace std;

__global__ void AddInts( int *a, int *b, int count){
  int id = blockIdx.x * blockDim.x +threadIdx.x;
  if (id < count){
    a[id] += b[id];
  }
}

int main(){
  srand(time(NULL));
  int count = 100;
  int *h_a = new int[count];
  int *h_b = new int[count];

  for (int i = 0; i < count; i++){ // Populating array with 100 random values
    h_a[i] = rand() % 1000; // elements in range 0 to 1000
    h_b[i] = rand() % 1000;
  }

  cout << "Prior to addition:" << endl;
  for (int i =0; i < 10; i++){ // Print out the first five of each
    cout << h_a[i] << " " << h_b[i] << endl;
  }

  int *d_a, *d_b; //device copies of those arrays

  if(cudaMalloc(&d_a, sizeof(int) * count) != cudaSuccess) // malloc for cudaMemcpyDeviceToHost
  {
    cout<<"Nope!";
    return -1;
  }
  if(cudaMalloc(&d_b, sizeof(int) * count) != cudaSuccess)
  {
    cout<<"Nope!";
    cudaFree(d_a);
    return -2;
  }

  if(cudaMemcpy(d_a, h_a, sizeof(int) * count, cudaMemcpyHostToDevice) != cudaSuccess)
  {
    cout << "Could not copy!" << endl;
    cudaFree(d_a);
    cudaFree(d_b);
    return -3;
  }
  if(cudaMemcpy(d_b, h_b, sizeof(int) * count, cudaMemcpyHostToDevice) != cudaSuccess)
  {
    cout << "Could not copy!" << endl;
    cudaFree(d_b);
    cudaFree(d_a);
    return -4;
  }

  AddInts<<<count / 256 +1, 256>>>(d_a, d_b, count);

  if(cudaMemcpy(h_a, d_a, sizeof(int) * count, cudaMemcpyDeviceToHost)!= cudaSuccess)   //magic of int division
  { // copy from device back to host
    delete[]h_a;
    delete[]h_b;
    cudaFree(d_a);
    cudaFree(d_b);
    cout << "Error: Copy data back to host failed" << endl;
    return -5;
  }
  delete[]h_a;
  delete[]h_b;
  cudaFree(d_a);
  cudaFree(d_b);

  for(int i = 0; i < 10; i++){
    cout<< "It's " << h_a[i] << endl;
  }

  return 0;
}

我编译为:

nvcc threads_blocks_grids.cu -o threads

nvcc -version的结果是:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

这是我的输出:

Prior to addition:
771 177
312 257
303 5
291 819
735 359
538 404
718 300
540 943
598 456
619 180
It's 42984048
It's 0
It's 42992112
It's 0
It's 1094
It's 942
It's 1018
It's 1483
It's 1054
It's 799
c++ cuda nvidia nvcc cudnn
1个回答
0
投票

打印前删除了主机阵列。那是undefined behavior。如果向上移动打印部件,则应解决。

© www.soinside.com 2019 - 2024. All rights reserved.