两个结果std :: chrono :: high_resolution_clock :: now（）给出了~270ns的差异

Question

我想用std::chrono时钟测量一段代码的持续时间，但它似乎太重了，无法测量持续纳秒的东西。该计划：

#include <cstdio>
#include <chrono>

int main() {
    using clock = std::chrono::high_resolution_clock;

    // try several times
    for (int i = 0; i < 5; i++) {
        // two consequent now() here, one right after another without anything in between
        printf("%dns\n", (int)std::chrono::duration_cast<std::chrono::nanoseconds>(clock::now() - clock::now()).count());
    }
    return 0;
}

总是给我100-300ns左右。这是因为两个系统调用吗？是否可以减少两个now（）之间的持续时间？谢谢！

环境：Linux Ubuntu 18.04，内核4.18，负载平均值低，stdlib动态链接。

Answer 1

如果你想测量非常快的代码片段的持续时间，通常最好多次运行它们并花费所有运行的平均时间，你提到的~200ns可以忽略不计，因为它们分布在所有运行中。

例：

#include <cstdio>
#include <chrono>
using clock = std::chrono::high_resolution_clock;

auto start = clock::now();
int n = 10000; // adjust depending on the expected runtime of your code
for (unsigned int i = 0; i < n; ++i)
    functionYouWantToTime();
auto result =
    std::chrono::duration_cast<std::chrono::nanoseconds>(start - clock::now()).count() / n;

Answer 2

只是不要使用时钟为纳秒基准。相反，使用CPU滴答 - 在现代足以担心纳秒的任何硬件上，CPU滴答在核心之间是单调的，稳定的和同步的。

不幸的是，C ++没有暴露CPU时钟，所以你必须直接使用RDTSC指令（它可以很好地包装在内联函数中，或者你可以使用编译器的内在函数）。如果您愿意（通过使用CPU频率），CPU滴答的差异也可以转换为时间，但通常对于这样的低延迟基准测试，没有必要。

Answer 3

使用rdtsc指令以最高分辨率和最小开销来测量时间：

#include <iostream>
#include <cstdint>

int main() {
    uint64_t a = __builtin_ia32_rdtsc();
    uint64_t b = __builtin_ia32_rdtsc();
    std::cout << b - a << " cpu cycles\n";
}

输出：

19 cpu cycles

要将周期转换为纳秒，将基本CPU频率除以GHz（以GHz为单位）。例如，对于4.2GHz i7-7700k除以4.2。

TSC是所有内核共享的CPU中的全局计数器。

现代CPU具有恒定的TSC，无论当前的CPU频率和增强如何，都以相同的速率滴答。在constant_tsc，/proc/cpuinfo场寻找flags。

另请注意，__builtin_ia32_rdtsc比内联汇编更有效，请参阅https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48877

两个结果std :: chrono :: high_resolution_clock :: now（）给出了~270ns的差异

问题描述投票：0回答：3

3个回答

最新问题

两个结果std :: chrono :: high_resolution_clock :: now（）给出了~270ns的差异

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3