将 uint64_teger {0,1} 映射到 float 64 {1.0,-1.0} 的最快方法是什么？

Question

我有一个名为

uintflag

的 64 位 uint64_t 标志变量，它只能保存值

或

。

我需要将

转换为

1.0

并将

转换为

-1.0

。此转换是从 uint64_teger（64 位）到 double（64 位）。

虽然基准测试是在 x86-64 平台上进行判断的，但我正在编写 GNU，并且我不知道它可以在什么处理器上运行，因此目标是编写可以在各种体系结构上运行的可移植代码。因此，为了确保不同平台上的正确性，我不能依赖 x86 特定的位格式。对于非 x86 平台，我优先考虑正确性而不是性能优化。

在x86中，uint到double的转换需要5个周期，所以这是昂贵的

return double(1.0 - 2.0 * uintflag);

if

也好不了多少，也许是因为它不是无分支的，而无分支代码往往更快。

return uintflag ? double(-1.0) : double(1.0);

我还尝试了一个包含浮点值的数组，并使用 uint 作为索引。

这往往更好，因为它是无分支的，并且不会进行任何从 int 到 double 的类型转换，但我怀疑它是从 L1 缓存中获取的，这需要大约 10 个周期。

double arrayMap[2] = {1.0, -1.0};    
return arrayMap[uintflag];

有什么方法可以确保数组驻留在CPU寄存器中，而不是从L1缓存中获取？

我得到的基准结果不一致。您能帮助我确定正确的基准测试方法并确定将

uintflag

转换为双精度值的最有效方法吗？

Average time for getFromArray: 5768500 nanoseconds
Average time for ifBranching: 6114700 nanoseconds
Average time for mathFormulaCasingtoFloat: 6803000 nanoseconds
   
Average time for getFromArray: 8157100 nanoseconds
Average time for ifBranching: 9361300 nanoseconds
Average time for mathFormulaCasingtoFloat: 16988800 nanoseconds

Average time for getFromArray: 8792100 nanoseconds
Average time for ifBranching: 7761900 nanoseconds
Average time for mathFormulaCasingtoFloat: 8643100 nanoseconds

这是代码

#include <iostream>
#include <chrono>
#include <functional>
#include <sstream>
#include <map>

using namespace std;

typedef double (*conversion_function)(uint64_t);

double ifBranching(uint64_t uintflag)
{
  return uintflag ? double(-1.0) : double(1.0);
}

double arrayMap[2] = {1.0, -1.0};

double getFromArray(uint64_t uintflag)
{
  return arrayMap[uintflag];
}

double mathFormulaCasingtoFloat(uint64_t uintflag)
{
  return double(1.0 - 2.0 * uintflag);
}

int main()
{
  uint64_t uintflag = 1;

  map<string, conversion_function> conversion_functions;
  conversion_functions["ifBranching"] = ifBranching;
  conversion_functions["getFromArray"] = getFromArray;
  conversion_functions["mathFormulaCasingtoFloat"] = mathFormulaCasingtoFloat;

  const int numRuns = 100000;

  for (auto &[functionName, function] : conversion_functions)

  {
    auto totalDuration = chrono::duration<double, nano>::zero();

    for (int i = 0; i < numRuns; i++)

    {
      auto start = chrono::high_resolution_clock::now();
      function(uintflag);
      auto end = chrono::high_resolution_clock::now();
      auto duration = chrono::duration_cast<chrono::nanoseconds>(end - start).count();

      totalDuration += chrono::duration<double, nano>(duration);
    }

    auto x = chrono::duration_cast<chrono::nanoseconds>(totalDuration);
    string strDuration = to_string(x.count());
    cout << "Average time for " << functionName << ": " << strDuration << " nanoseconds" << endl;
  }

  return 0;
}

Answer 1

根据 Howard Hinnant 的评论，我发现这是最快的，但我不确定它在实际代码中是否有意义。
将 constArrayMap 声明为 constexpr 似乎有帮助，但我没有在我的 PC 上运行一致的基准测试。

constexpr double constArrayMap[2] = {1.0, -1.0};
constexpr double constexprArray(uint64_t uintFlag)
{

  volatile double answer = constArrayMap[uintFlag]; //volatile is only used for benchmarking
  return answer;
}

我在运行之间没有获得一致的基准，但这是一个典型的基准

Average time for constexprArray: 311.800000 nanoseconds
Average time for getFromArray: 358.900000 nanoseconds
Average time for getFromConstArray: 322.200000 nanoseconds
Average time for ifBranching: 748.100000 nanoseconds
Average time for mathFormulaCastingtoFloat: 340.500000 nanoseconds
Average time for mathFormulaDelayedCastingtoFloat: 459.600000 nanoseconds
Average time for switchBranching: 820.000000 nanoseconds

我希望得到类似

GOTO uintFlag*space

的东西，或者一些在两个常量之间进行选择的按位技巧。像这样的东西（不起作用）

constexpr double ONE=1.0;
constexpr double MINUS_ONE=-1.0;
double bitwiseChoose(uint64_t uintFlag)
//this code is illustrative, and doesn't works
{ // if uintFlag is 0, MINUS_ONE is erased, if uintFlag is 1, ONE is erased
  volatile double answer = (MINUS_ONE & uintFlag) | (ONE & ~uintFlag);
  return answer;
}

这是我根据评论使用的代码：

#include <iostream>
#include <chrono>
#include <functional>
#include <sstream>
#include <map>
#include <random>

using namespace std;

typedef double (*conversion_function)(uint64_t);

double ifBranching(uint64_t uintFlag)
{
  volatile double answer = uintFlag ? double(-1.0) : double(1.0);
  return answer;
}
double switchBranching(uint64_t uintFlag)
{
  // volatile double answer;
  switch (uintFlag)
  {
  case 0:
    return double(1.0);
  case 1:
    return double(-1.0);
  }
}

double arrayMap[2] = {1.0, -1.0};

double getFromArray(uint64_t uintFlag)
{
  volatile double answer = arrayMap[uintFlag];
  return answer;
}

constexpr double constArrayMap[2] = {1.0, -1.0};
double getFromConstArray(uint64_t uintFlag)
{
  volatile double answer = constArrayMap[uintFlag];
  return answer;
}

double mathFormulaCastingtoFloat(uint64_t uintFlag)
{
  volatile double answer = double(1.0 - 2.0 * uintFlag);
  return answer;
}
double mathFormulaDelayedCastingToFloat(uint64_t uintFlag)
{
  volatile double answer = 1 - 2 * uint64_t(uintFlag);
  return answer;
}
constexpr double constexprArray(uint64_t uintFlag)
{

  volatile double answer = constArrayMap[uintFlag];
  return answer;
}

int main()
{
  unsigned long long uintFlag[100000];
  std::random_device rd;
  std::mt19937 gen(rd());
  std::uniform_int_distribution<> dis(0, 1);

  for (int i = 0; i < 100000; i++)
  {
    uintFlag[i] = dis(gen);
  }

  map<string, conversion_function> conversion_functions;
  conversion_functions["ifBranching"] = ifBranching;
  conversion_functions["getFromArray"] = getFromArray;
  conversion_functions["getFromConstArray"] = getFromConstArray;
  conversion_functions["constexprArray"] = constexprArray;
  conversion_functions["switchBranching"] = switchBranching;
  conversion_functions["mathFormulaCastingtoFloat"] = mathFormulaCastingtoFloat;
  conversion_functions["mathFormulaDelayedCastingtoFloat"] = mathFormulaDelayedCastingToFloat;

  const int numRuns = 100000;

  for (auto &[functionName, function] : conversion_functions)

  {

    auto start = chrono::high_resolution_clock::now();

    _mm_prefetch(constArrayMap, _MM_HINT_T0);
    for (auto flag : uintFlag)

    {
      volatile double answer = function(flag);
    }

    auto end = chrono::high_resolution_clock::now();
    auto totalDuration = chrono::duration_cast<chrono::nanoseconds>(end - start).count();

    auto x = chrono::duration_cast<chrono::nanoseconds>(chrono::seconds(totalDuration));
    string strDuration = to_string(x.count() / 1e12);
    cout << "Average time for " << functionName << ": " << strDuration << " nanoseconds" << endl;
  }
  cout << endl;
  return 0;
}

将 uint64_teger {0,1} 映射到 float 64 {1.0,-1.0} 的最快方法是什么？

问题描述投票：0回答：1

1个回答

最新问题

将 uint64_teger {0,1} 映射到 float 64 {1.0,-1.0} 的最快方法是什么？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1