创建线程时设置CPU亲和性

Question

我想创建一个 C++11 线程，希望它在我的第一个核心上运行。我发现

pthread_setaffinity_np

和

sched_setaffinity

可以改变线程的CPU亲和性并将其迁移到指定的CPU。然而，这个亲和力规范在线程运行后会发生变化。

如何创建具有特定 CPU 关联性的 C++11 线程（

cpu_set_t

对象）？

如果在初始化C++11线程时无法指定亲和力，如何在C中使用

pthread_t

来做到这一点？

我的环境是Ubuntu上的G++。一段代码表示赞赏。

Answer 1

我很抱歉成为这里的“神话终结者”，但是设置线程亲和力非常重要，并且随着时间的推移，随着我们使用的系统本质上变得越来越 NUMA（非统一内存架构），它的重要性也越来越大。如今，即使是普通的双插槽服务器也将 RAM 单独连接到每个插槽，并且从插槽到其自己的 RAM 的内存访问与邻近处理器插槽（远程 RAM）的内存访问之间的差异很大。在不久的将来，处理器将进入市场，其中内部核心组本身就是 NUMA（用于不同核心组的独立内存控制器等）。这里不需要我重复别人的工作，只要在网上查找“NUMA和线程亲和力” - 就可以学习其他工程师多年的经验。

不设置线程关联性实际上等于“希望”操作系统调度程序能够正确处理线程关联性。让我解释：您的系统具有一些 NUMA 节点（处理和内存域）。您启动一个线程，该线程会使用内存执行一些操作，例如malloc 一些内存，然后进行处理等。到目前为止，现代操作系统（至少是 Linux，其他操作系统也可能如此）做得很好，默认情况下，内存是从线程运行的 CPU 的同一域分配的（如果可用）。到时候，分时操作系统（所有现代操作系统）将使线程进入睡眠状态。当线程重新进入运行状态时，它可能会在系统中的“任何”核心上运行（因为您没有为其设置亲和力掩码），并且您的系统越大，它的机会就越高将在远离其先前分配或使用的内存的 CPU 上“唤醒”。现在，您的所有内存访问都将是远程的（不确定这对您的应用程序性能意味着什么？在线阅读有关 NUMA 系统上的远程内存访问的更多信息）因此，总而言之，当在具有非常简单的架构的系统上运行代码时，亲和力设置接口非常重要——这些系统如今正迅速成为“任何系统”。某些线程运行时环境/库允许在运行时对此进行控制，而无需任何特定编程（请参阅 OpenMP，例如在 Intel 的 KMP_AFFINITY 环境变量实现中） - 对于 C++11 实现者来说，在它们的运行时库和语言选项（在此之前，如果您的代码旨在在服务器上使用，我强烈建议您在代码中实现亲和力控制）

Answer 2

link

我重写了Eli Bendersky博客上的代码，上面粘贴了链接。您可以将下面的代码保存到 test.cpp 并编译并运行它:

// g++ ./test.cpp -lpthread && ./a.out // #include <thread> #include <vector> #include <iostream> #include <mutex> #include <sched.h> #include <pthread.h> int main(int argc, const char** argv) { constexpr unsigned num_threads = 4; // A mutex ensures orderly access to std::cout from multiple threads. std::mutex iomutex; std::vector<std::thread> threads(num_threads); for (unsigned i = 0; i < num_threads; ++i) { threads[i] = std::thread([&iomutex, i,&threads] { // Create a cpu_set_t object representing a set of CPUs. Clear it and mark // only CPU i as set. cpu_set_t cpuset; CPU_ZERO(&cpuset); CPU_SET(i, &cpuset); int rc = pthread_setaffinity_np(threads[i].native_handle(), sizeof(cpu_set_t), &cpuset); if (rc != 0) { std::cerr << "Error calling pthread_setaffinity_np: " << rc << "\n"; } std::this_thread::sleep_for(std::chrono::milliseconds(20)); while (1) { { // Use a lexical scope and lock_guard to safely lock the mutex only // for the duration of std::cout usage. std::lock_guard<std::mutex> iolock(iomutex); std::cout << "Thread #" << i << ": on CPU " << sched_getcpu() << "\n"; } // Simulate important work done by the tread by sleeping for a bit... std::this_thread::sleep_for(std::chrono::milliseconds(900)); } }); } for (auto& t : threads) { t.join(); } return 0; }

Answer 3

pthread_t my_thread_native = my_thread.native_handle();

然后您可以使用传入 my_thread_native 的任何 pthread 调用来获取 pthread 线程 ID。

请注意，大多数线程设施都是特定于实现的，即 pthreads、windows 线程、其他操作系统的本机线程都有自己的接口和类型，这部分代码不太可移植。

Answer 4

Y00

的答案，似乎您可以通过调用直接从线程本身内部修改CPU关联性，而无需通过调用threads[i]来引用外部

pthread_self

数组。当然，它不能在 Linux 之外移植，但可以降低复杂性。

这是一个例子：

#include <stdio.h> #include <thread> #include <mutex> #include <vector> #include <unistd.h> #include <string.h> typedef std::lock_guard<std::mutex> TGuard; int totalCores = 0; void runner(int idx) { int pinCore = idx % totalCores; printf("Launching Running # %d. Pin to %d\n", idx, pinCore); pthread_t self = pthread_self(); cpu_set_t cpuset; CPU_ZERO(&cpuset); CPU_SET(pinCore, &cpuset); int rc = pthread_setaffinity_np(self, sizeof(cpu_set_t), &cpuset); if (rc != 0) { printf("Failed to pin core: %s\n", strerror(errno)); exit(1); } while (1) { printf("#%d Running: CPU %d\n", idx, sched_getcpu()); sleep(1); } } int main(int argc, char **argv) { totalCores = std::thread::hardware_concurrency(); printf("Starting. %d cores\n", totalCores); std::vector<std::thread> threadList; const int N = 6; for (int i = 0; i < N; i++) { std::thread th = std::thread(runner, i); threadList.push_back(std::move(th)); } for (auto &th : threadList) { if (th.joinable()) th.join(); } printf("Complete\n"); }

Answer 5

thread

。

原因是，创建线程时

不需要

指定亲和力。那么，为什么还要用语言让它成为可能呢？比如说，我们希望工作负载

f()

绑定到 CPU0。我们可以通过调用

pthread_setaffinity_np

，在实际工作负载之前更改与 CPU0 的关联性。

但是，我们

CAN

在C中创建线程时指定了亲和力（感谢Tony D的评论）。例如，以下代码输出“Hello pthread”。 void *f(void *p) { std::cout<<"Hello pthread"<<std::endl; } cpu_set_t cpuset; CPU_ZERO(&cpuset); CPU_SET(0, &cpuset); pthread_attr_t pta; pthread_attr_init(&pta); pthread_attr_setaffinity_np(&pta, sizeof(cpuset), &cpuset); pthread_t thread; if (pthread_create(&thread, &pta, f, NULL) != 0) { std::cerr << "Error in creating thread" << std::endl; } pthread_join(thread, NULL); pthread_attr_destroy(&pta);

创建线程时设置CPU亲和性

问题描述投票：0回答：5

5个回答

最新问题

创建线程时设置CPU亲和性

问题描述 投票：0回答：5

5个回答

最新问题

问题描述投票：0回答：5