c++ 如何优雅地使用 c++17 并行执行和计算整数的 for 循环？

Question

我可以做

std::vector<int> a;
a.reserve(1000);
for(int i=0; i<1000; i++)
    a.push_back(i);
std::for_each(std::execution::par_unseq, std::begin(a), std::end(a), [&](int i) {
  ... do something based on i ...
});

但是有没有更优雅的方法来创建 for(int i=0; i

您可以使用

std::generate

创建向量

{0, 1, ..., 999}

std::vector<int> v(1000);
std::generate(v.begin(), v.end(), [n = 0] () mutable { return n++; });

有一个接受

ExecutionPolicy

的重载，因此您可以将上面的内容修改为

std::vector<int> v(1000);
std::generate(std::execution::par, v.begin(), v.end(), [n = 0] () mutable { return n++; });

虽然我无法建议避免填充向量的方法，但我可以建议使用

std::iota()

函数作为（也许）用递增整数填充向量的最有效/优雅的方法：

std::vector<int> a(1000);
std::iota(std::begin(a), std::end(a), 0);
std::for_each(std::execution::par_unseq, std::begin(a), std::end(a), [&](int i) {
  // ... do something based on i ...
});

std::iota

的复杂度正是

last - first

增量和赋值，而

std::generate

函数的复杂度为

last - first

调用
g()
和赋值。即使一个像样的编译器要为

内联一个简单的增量 lambda 函数，

iota

语法也要简单得多，恕我直言。

这里有三种方法来做到这一点无需预先填充向量只是为了存储整数序列。

您可以使用

Boost.counting_range

（或根据您的喜好直接使用

Boost.counting_iterator

）...尽管祝您好运从阅读文档中找到如何操作。

 auto range = boost::counting_range<int>(0,1000);
 std::for_each(std::execution::par_unseq,
               range.begin(),
               range.end(),
               [&](int i) {
                   //  ... do something based on i ...
               });

如果你不想包含Boost，我们可以直接写一个简单的版本。

不为将

iota

和

iterator

一起咀嚼而道歉，而不是想出一个像样的名字，下面将让你写一些类似于上面 Boost 版本的东西：

 std::for_each(std::execution::par_unseq,
               ioterable<int>(0),
               ioterable<int>(1000),
               [&](int i) {
                 //  ... do something based on i ...
               }
 );

您可以看到使用 Boost 为此节省了多少样板：

 template <typename NumericType>
 struct ioterable
 {
     using iterator_category = std::input_iterator_tag;
     using value_type = NumericType;
     using difference_type = NumericType;
     using pointer = std::add_pointer_t<NumericType>;
     using reference = NumericType;

     explicit ioterable(NumericType n) : val_(n) {}

     ioterable() = default;
     ioterable(ioterable&&) = default;
     ioterable(ioterable const&) = default;
     ioterable& operator=(ioterable&&) = default;
     ioterable& operator=(ioterable const&) = default;

     ioterable& operator++() { ++val_; return *this; }
     ioterable operator++(int) { ioterable tmp(*this); ++val_; return tmp; }
     bool operator==(ioterable const& other) const { return val_ == other.val_; }
     bool operator!=(ioterable const& other) const { return val_ != other.val_; }

     value_type operator*() const { return val_; }

 private:
     NumericType val_{ std::numeric_limits<NumericType>::max() };
 };

对于后代，如果您将来可以使用 C++20，则
```
std::ranges::iota_view
```
将是更好的选择。

VisualC++ 提供了丰富的并行编程环境、并发运行时ConCRT。
您可以使用 OpenMP，它是开放标准，但也可在 ConCRT 中使用。正如 wikipedia 上所述，它是令人尴尬的并行，以下代码应该创建 1000 个线程：

#include <omp.h>
...
#pragma omp parallel for
for(int s = 0; s < 1000; s++)
{
    for(int i = 0; i < s; i++)
        ... do something parallel based on i ...
}

如果未指定编译器选项 /openmp，#pragma omp 指令将被忽略。其实我不太明白你的向量的作用，所以就省略了。另外，我不明白用任何 for_each 替换标准 for 并使用保存的索引背后的原因，因为 for 循环做得很好。
或者您可以使用 Microsoft 特定库 PPL。以下代码还创建 1000 个线程，生成从 0 到 999（含）的索引，并作为 s 变量传递到并行例程：

#include <ppl.h>
...
using namespace concurrency;
parallel_for(0, 1000, [&](int s)
{
   for(int i = 0; i < s; i++)
      ... do something parallel based on i ...
});

对于大量并行计算，并发运行时还可以使用 AMP。 AMP 在 GPU 而不是 CPU 上执行并行例程。

Answer 1

虽然我无法建议避免填充向量的方法，但我可以建议使用

std::iota()

函数作为（也许）用递增整数填充向量的最有效/优雅的方法：

std::vector<int> a(1000);
std::iota(std::begin(a), std::end(a), 0);
std::for_each(std::execution::par_unseq, std::begin(a), std::end(a), [&](int i) {
  // ... do something based on i ...
});

std::iota

的复杂度正是

last - first

增量和赋值，而

std::generate

函数的复杂度为

last - first

调用
g()
和赋值。即使一个像样的编译器要为

内联一个简单的增量 lambda 函数，

iota

语法也要简单得多，恕我直言。

Answer 2

这里有三种方法来做到这一点无需预先填充向量只是为了存储整数序列。

您可以使用

Boost.counting_range

（或根据您的喜好直接使用

Boost.counting_iterator

）...尽管祝您好运从阅读文档中找到如何操作。

 auto range = boost::counting_range<int>(0,1000);
 std::for_each(std::execution::par_unseq,
               range.begin(),
               range.end(),
               [&](int i) {
                   //  ... do something based on i ...
               });

如果你不想包含Boost，我们可以直接写一个简单的版本。

不为将

iota

和

iterator

一起咀嚼而道歉，而不是想出一个像样的名字，下面将让你写一些类似于上面 Boost 版本的东西：

 std::for_each(std::execution::par_unseq,
               ioterable<int>(0),
               ioterable<int>(1000),
               [&](int i) {
                 //  ... do something based on i ...
               }
 );

您可以看到使用 Boost 为此节省了多少样板：

 template <typename NumericType>
 struct ioterable
 {
     using iterator_category = std::input_iterator_tag;
     using value_type = NumericType;
     using difference_type = NumericType;
     using pointer = std::add_pointer_t<NumericType>;
     using reference = NumericType;

     explicit ioterable(NumericType n) : val_(n) {}

     ioterable() = default;
     ioterable(ioterable&&) = default;
     ioterable(ioterable const&) = default;
     ioterable& operator=(ioterable&&) = default;
     ioterable& operator=(ioterable const&) = default;

     ioterable& operator++() { ++val_; return *this; }
     ioterable operator++(int) { ioterable tmp(*this); ++val_; return tmp; }
     bool operator==(ioterable const& other) const { return val_ == other.val_; }
     bool operator!=(ioterable const& other) const { return val_ != other.val_; }

     value_type operator*() const { return val_; }

 private:
     NumericType val_{ std::numeric_limits<NumericType>::max() };
 };

对于后代，如果您将来可以使用 C++20，则
```
std::ranges::iota_view
```
将是更好的选择。

Answer 3

VisualC++ 提供了丰富的并行编程环境、并发运行时ConCRT。
您可以使用 OpenMP，它是开放标准，但也可在 ConCRT 中使用。正如 wikipedia 上所述，它是令人尴尬的并行，以下代码应该创建 1000 个线程：

#include <omp.h>
...
#pragma omp parallel for
for(int s = 0; s < 1000; s++)
{
    for(int i = 0; i < s; i++)
        ... do something parallel based on i ...
}

如果未指定编译器选项 /openmp，#pragma omp 指令将被忽略。其实我不太明白你的向量的作用，所以就省略了。另外，我不明白用任何 for_each 替换标准 for 并使用保存的索引背后的原因，因为 for 循环做得很好。
或者您可以使用 Microsoft 特定库 PPL。以下代码还创建 1000 个线程，生成从 0 到 999（含）的索引，并作为 s 变量传递到并行例程：

#include <ppl.h>
...
using namespace concurrency;
parallel_for(0, 1000, [&](int s)
{
   for(int i = 0; i < s; i++)
      ... do something parallel based on i ...
});

对于大量并行计算，并发运行时还可以使用 AMP。 AMP 在 GPU 而不是 CPU 上执行并行例程。

c++ 如何优雅地使用 c++17 并行执行和计算整数的 for 循环？

问题描述投票：0回答：4

4个回答

最新问题

c++ 如何优雅地使用 c++17 并行执行和计算整数的 for 循环？

问题描述 投票：0回答：4

4个回答

最新问题

问题描述投票：0回答：4