在CUDA上使用RSA加密进行斗争

Question

我正在尝试使用CUDA使用RSA算法加速加密。我无法在内核函数中正确执行power-modulo。

我在AWS上使用Cuda编译工具，发布9.0版，V9.0.176进行编译。

#include <cstdio>
#include <math.h>
#include "main.h"

// Kernel function to encrypt the message (m_in) elements into cipher (c_out)
__global__
void enc(int numElements, int e, int n, int *m_in, int *c_out)
{
    int index = blockIdx.x * blockDim.x + threadIdx.x;
    int stride = blockDim.x * gridDim.x;

    printf("e = %d, n = %d, numElements = %d\n", e, n, numElements);
    for (int i = index; i < numElements; i += stride)
    {
// POINT OF ERROR //
        // c_out[i] = (m_in[i]^e) % n;     //**GIVES WRONG RESULTS**
         c_out[i] = __pow(m_in[i], e) % n; //**GIVES, error: expression must have integral or enum type**
    }


}

// This function is called from main() from other file.
int* cuda_rsa(int numElements, int* data, int public_key, int key_length)
{
    int e = public_key;
    int n = key_length;

    // Allocate Unified Memory – accessible from CPU or GPU
    int* message_array;
    cudaMallocManaged(&message_array, numElements*sizeof(int));
    int* cipher_shared_array;       //Array shared by CPU and GPU
    cudaMallocManaged(&cipher_shared_array, numElements*sizeof(int));

    int* cipher_array = (int*)malloc(numElements * sizeof(int));

    //Put message array to be encrypted in a managed array
    for(int i=0; i<numElements; i++)
    {
        message_array[i] = data[i];
    }

    // Run kernel on 16M elements on the GPU
    enc<<<1, 1>>>(numElements, e, n, message_array, cipher_shared_array);

    // Wait for GPU to finish before accessing on host
    cudaDeviceSynchronize();

    //Copy into a host array and pass it to main() function for verification. 
    //Ignored memory leaks.
    for(int i=0; i<numElements; i++)
    {
        cipher_array[i] = cipher_shared_array[i];
    }
    return (cipher_array);
}

请帮我解决这个错误。如何在CUDA内核上实现power-modulo（如下所示）？

(x ^ y) % n;

我真的很感激任何帮助。

Answer 1

在C或C ++中，这个：

(x^y)

不会将x提升到y的幂。 It performs a bitwise exclusive-or operation。这就是为什么你的第一个实现没有给出正确的答案。

在C或C ++中，模运算符：

是only defined for integer arguments。即使您将整数传递给__pow()函数，该函数的返回结果也是double（即浮点数，而不是整数）。

我不知道你需要执行的数学细节，但如果你将__pow的结果转换为int（例如），这个编译错误将消失。对于您希望执行的任何算术，这可能有效也可能无效。（例如，您可能希望将其强制转换为“长”整数。）

执行此操作后，您将遇到另一个编译错误。最简单的方法是使用pow()而不是__pow()：

c_out[i] = (int)pow(m_in[i], e) % n;

如果你实际上尝试使用the CUDA fast-math intrinsic，你应该使用__powf而不是__pow：

c_out[i] = (int)__powf(m_in[i], e) % n;

请注意，快速数学内在函数通常会降低精度。

由于这些提升功率函数正在执行浮点运算（即使您传递的是整数），因此可能会得到一些可能意外的结果。例如，如果你将5增加到2的幂，则可能得到24.9999999999而不是25.如果你只是将它转换为整数，你将被截断为24.因此你可能需要探索将结果四舍五入到最接近的整数，而不是铸造。但是，我还没有研究过你想要表演的数学。

在CUDA上使用RSA加密进行斗争

问题描述投票：0回答：1

1个回答

最新问题

在CUDA上使用RSA加密进行斗争

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1