CUDA寄存器使用

Question

CUDA手册指定的每个多处理器的32位寄存器的数量。这是否意味着：

双变量有两个寄存器？
指针变量有两个寄存器？ - 它必须是在费米以上的寄存器有6 GB的内存，对不对？
如果回答问题2是肯定的，它必须是最好少用指针变量多int指数。 E.克，这个内核代码： float* p1; // two regs float* p2 = p1 + 1000; // two regs int i; // one reg for ( i = 0; i < n; i++ ) { CODE THAT USES p1[i] and p2[i] } 理论上需要比这个内核代码的寄存器： float* p1; // two regs int i; // one reg int j; // one reg for ( i = 0, j = 1000; i < n; i++, j++ ) { CODE THAT USES p1[i] and p1[j] }

Answer 1

简短的回答你的三个问题是：

是。
是的，如果代码被编译为64位主机操作系统。设备指针大小始终CUDA匹配主机应用程序的指针大小。
没有。

为了扩展点3，考虑下面两个简单的内存拷贝内核：

__global__
void debunk(float *in, float *out, int n)
{
    int i = n * (threadIdx.x + blockIdx.x*blockDim.x);

    for(int j=0; j<n; j++) {
        out[i+j] = in[i+j];
    }
}

__global__
void debunk2(float *in, float *out, int n)
{
    int i = n * (threadIdx.x + blockIdx.x*blockDim.x);
    float *x = in + i;
    float *y = out + i;

    for(int j=0; j<n; j++, x++, y++) {
        *x = *y;
    }
}

通过你的心目当中，debunk必须用更少的寄存器，因为它只有两个局部整型变量，而debunk2有两个额外的指针。然而，当我使用CUDA 5的发布工具链编译它们：

$ nvcc -m64 -arch=sm_20 -c -Xptxas="-v"  pointer_size.cu 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function '_Z6debunkPfS_i' for 'sm_20'
ptxas info    : Function properties for _Z6debunkPfS_i
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 8 registers, 52 bytes cmem[0]
ptxas info    : Compiling entry function '_Z7debunk2PfS_i' for 'sm_20'
ptxas info    : Function properties for _Z7debunk2PfS_i
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 8 registers, 52 bytes cmem[0]

他们搜集到完全相同的寄存器计数。如果你拆卸工具链输出，你会看到，除了设置代码，最终指令流几乎是相同的。有许多的原因，但它基本上可以归结为两个简单的规则：

试图确定从C代码寄存器计数（或甚至PTX汇编）主要是徒劳
试图揣摩一个非常复杂的编译器和汇编程序也大多是徒劳的。

CUDA寄存器使用

问题描述投票：3回答：1

1个回答

最新问题

CUDA寄存器使用

问题描述 投票：3回答：1

1个回答

最新问题

问题描述投票：3回答：1