Google基准测试结果中显示的时间没有意义

Question

我在处理器上标出一些示例功能，每个内核以2 GHz运行。这是标有基准的功能。另外，在quick-bench]上可用

#include <stdlib.h>
#include <time.h>
#include <memory>

class Base
{
  public:       
   virtual int addNumVirt( int x ) { return (i + x); }
   int addNum( int x ) { return (x + i); }
   virtual ~Base() {}

  private:
   uint32_t i{10};
};

class Derived : public Base
{
  public:
   // Overrides of virtual functions are always virtual
   int addNumVirt( int x ) { return (x + i); }
   int addNum( int x ) { return (x + i); }

  private:
   uint32_t i{20};
};

static void BM_nonVirtualFunc(benchmark::State &state)
{
 srand(time(0));
 volatile int x = rand();
 std::unique_ptr<Derived> derived = std::make_unique<Derived>();
 for (auto _ : state)
 {
   auto result = derived->addNum( x );
   benchmark::DoNotOptimize(result);
 }
}
BENCHMARK(BM_nonVirtualFunc);

static void BM_virtualFunc(benchmark::State &state)
{
 srand(time(0));
 volatile int x = rand();
 std::unique_ptr<Base> derived = std::make_unique<Derived>();
 for (auto _ : state)
 {
   auto result = derived->addNumVirt( x );
   benchmark::DoNotOptimize(result);
 }
}
BENCHMARK(BM_virtualFunc);

static void StringCreation(benchmark::State& state) {
  // Code inside this loop is measured repeatedly
  for (auto _ : state) {
    std::string created_string("hello");
    // Make sure the variable is not optimized away by compiler
    benchmark::DoNotOptimize(created_string);
  }
}
// Register the function as a benchmark
BENCHMARK(StringCreation);

static void StringCopy(benchmark::State& state) {
  // Code before the loop is not measured
  std::string x = "hello";
  for (auto _ : state) {
    std::string copy(x);
  }
}
BENCHMARK(StringCopy);
下面是Google基准测试结果。

Run on (64 X 2000 MHz CPU s) CPU Caches: L1 Data 32K (x32) L1 Instruction 64K (x32) L2 Unified 512K (x32) L3 Unified 8192K (x8) Load Average: 0.08, 0.04, 0.00 ------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------ BM_nonVirtualFunc 0.490 ns 0.490 ns 1000000000 BM_virtualFunc 0.858 ns 0.858 ns 825026009 StringCreation 2.74 ns 2.74 ns 253578500 BM_StringCopy 5.24 ns 5.24 ns 132874574

结果显示，前两个函数的执行时间为0.490 ns和0.858 ns。但是，我不知道我的内核是否以2 GHz运行，这意味着一个周期为0.5 ns，这使得结果似乎不合理。

我知道显示的结果是迭代次数的平均值。如此低的执行时间意味着大多数样本都低于0.5 ns。

我想念什么？

编辑1：

从注释中看来，将常量i添加到x似乎不是一个好主意。实际上，我首先在虚拟和非虚拟函数中调用std::cout。这有助于我理解虚拟函数未内联并且需要在运行时解决该调用。

但是，在终端中将功能标有基准的输出看起来并不好。（是否可以通过Godbolt共享我的代码？）任何人都可以提出替代方法来在函数内部打印内容吗？

我在处理器上标出一些示例功能，每个内核以2 GHz运行。这是标有基准的功能。另外，还可以在快速测试台上#include #include＆...

Answer 1

现代编译器只做宏伟的事情。并非总是最可预测的事物，而是通常是美好的事物。您可以通过观察建议的ASM或降低优化级别来查看。Optim = 1使nonVirtualFunc在CPU时间方面等同于virtualFunc，而optim = 0使您的所有功能都达到相似的水平（编辑：当然以一种不好的方式；不要这样做实际上是为了得出性能结论）。

Google基准测试结果中显示的时间没有意义

问题描述投票：2回答：1

1个回答

最新问题

Google基准测试结果中显示的时间没有意义

问题描述 投票：2回答：1

1个回答

最新问题

问题描述投票：2回答：1