使用 thrust 时，在 host device 仿函数中创建 std::array 是否合法？

Question

我写了一个玩具代码来测试一些想法

#include <thrust/transform.h>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/reduce.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/execution_policy.h>
#include <iostream>
#include <array>
#include <vector>

#define N 20

struct func {
        __host__ __device__
        float operator()(float x) { return x*2; }
};

template <typename S>
struct O {
        const std::array<float,2> a;
        O(std::array<float,2> a): a(a) {}

        S f;
        __host__ __device__
        float operator()(float &v) {
                std::array<int,3> b = {2,3,4};
                int tmp;
                for (int i=0; i<3; i++) {
                        tmp = thrust::reduce(thrust::device,b.begin(),b.end(),0);
                        printf("%d",tmp);
                }
                return a[0]*v + a[1] + f(a[0]);
        }
};

int main(void) {

        thrust::host_vector<float> _v1(N);
        thrust::device_vector<float> v1 = _v1, v2;
        thrust::fill(v1.begin(),v1.end(),12);
        v2.resize(N);

        std::array<float,2> a{1,2};
        auto c_itor = thrust::make_counting_iterator(0);
        thrust::transform(v1.begin(),v1.end(),v2.begin(),O<func>(a));

        thrust::copy(v2.begin(),v2.end(),std::ostream_iterator<float>(std::cout," "));

}

这段代码在使用

nvcc --expt-relaxed-constexpr -std=c++17

时完美运行。可以看到在

std::array

仿函数中出现了很多像

__host__ __device__

这样的标准容器，我想知道的是

这样写合法吗？（就效率而言，而不是语法有效性）
由于代码运行正确，std 对象存储在哪里？（设备或主机）

Answer 1

在 C++17 或更高版本中使用

std::array

和

--expt-relaxed-constexpr

的特殊情况是因为

std::array

是一个非常薄的 C 风格数组包装器，并且在 C++17 中使用了所有成员函数 您使用的是

constexpr

。我认为除了

std::array::fill

和

std::array::swap

之外的所有成员函数都是 C++17 的 constexpr。这两个用 C++20 进行了

constexpr

处理。

因此，出于性能考虑，您的代码应该与使用

float a[2]

和

int b[3]

时执行相同的操作。这意味着如果可能，值将存储在寄存器中（这取决于

的循环展开和通常的寄存器压力）。只要您不过度使用数组的大小，这就很好。参见例如这个答案用于更深入地讨论数组、寄存器和本地内存。

其他容器/替代品：

对于其他使用动态内存的 STL 容器，就成员函数而言，您可能不会那么幸运

constexpr

。 HPC

nvc++

编译器（以前的 PGI C++ 编译器）不需要

__device__

标记，因此理论上可以在设备代码中使用更多的 STL 功能，但在大多数情况下，就性能而言这不是一个好主意。 STL 函数还必须仍然符合 CUDA 的C++ 语言限制。

Nvidia 正在开发自己的 C++ 标准库实现，并在 libcu++ 中使用自己的设备扩展。目前还没有容器，但它们可能会在未来出现。对于哈希表，有 cuCollections 库（WIP）。

使用 thrust 时，在 host device 仿函数中创建 std::array 是否合法？

问题描述投票：0回答：1

1个回答

其他容器/替代品：

最新问题

使用 thrust 时，在 __host__ __device__ 仿函数中创建 std::array 是否合法？

问题描述 投票：0回答：1

1个回答

其他容器/替代品：

最新问题

使用 thrust 时，在 host device 仿函数中创建 std::array 是否合法？

问题描述投票：0回答：1