我有一个名为
uintflag
的 64 位 uint64_t 标志变量,它只能保存值 0
或 1
。
我需要将
0
转换为 1.0
并将 1
转换为 -1.0
。此转换是从 uint64_teger(64 位)到 double(64 位)。
虽然基准测试是在 x86-64 平台上进行判断的,但我正在编写 GNU,并且我不知道它可以在什么处理器上运行,因此目标是编写可以在各种体系结构上运行的可移植代码。因此,为了确保不同平台上的正确性,我不能依赖 x86 特定的位格式。 对于非 x86 平台,我优先考虑正确性而不是性能优化。
在x86中,uint到double的转换需要5个周期,所以这是昂贵的
return double(1.0 - 2.0 * uintflag);
if
也好不了多少,也许是因为它不是无分支的,而无分支代码往往更快。
return uintflag ? double(-1.0) : double(1.0);
我还尝试了一个包含浮点值的数组,并使用 uint 作为索引。
这往往更好,因为它是无分支的,并且不会进行任何从 int 到 double 的类型转换,但我怀疑它是从 L1 缓存中获取的,这需要大约 10 个周期。
double arrayMap[2] = {1.0, -1.0};
return arrayMap[uintflag];
有什么方法可以确保数组驻留在CPU寄存器中,而不是从L1缓存中获取?
我得到的基准结果不一致。您能帮助我确定正确的基准测试方法并确定将
uintflag
转换为双精度值的最有效方法吗?
Average time for getFromArray: 5768500 nanoseconds
Average time for ifBranching: 6114700 nanoseconds
Average time for mathFormulaCasingtoFloat: 6803000 nanoseconds
Average time for getFromArray: 8157100 nanoseconds
Average time for ifBranching: 9361300 nanoseconds
Average time for mathFormulaCasingtoFloat: 16988800 nanoseconds
Average time for getFromArray: 8792100 nanoseconds
Average time for ifBranching: 7761900 nanoseconds
Average time for mathFormulaCasingtoFloat: 8643100 nanoseconds
这是代码
#include <iostream>
#include <chrono>
#include <functional>
#include <sstream>
#include <map>
using namespace std;
typedef double (*conversion_function)(uint64_t);
double ifBranching(uint64_t uintflag)
{
return uintflag ? double(-1.0) : double(1.0);
}
double arrayMap[2] = {1.0, -1.0};
double getFromArray(uint64_t uintflag)
{
return arrayMap[uintflag];
}
double mathFormulaCasingtoFloat(uint64_t uintflag)
{
return double(1.0 - 2.0 * uintflag);
}
int main()
{
uint64_t uintflag = 1;
map<string, conversion_function> conversion_functions;
conversion_functions["ifBranching"] = ifBranching;
conversion_functions["getFromArray"] = getFromArray;
conversion_functions["mathFormulaCasingtoFloat"] = mathFormulaCasingtoFloat;
const int numRuns = 100000;
for (auto &[functionName, function] : conversion_functions)
{
auto totalDuration = chrono::duration<double, nano>::zero();
for (int i = 0; i < numRuns; i++)
{
auto start = chrono::high_resolution_clock::now();
function(uintflag);
auto end = chrono::high_resolution_clock::now();
auto duration = chrono::duration_cast<chrono::nanoseconds>(end - start).count();
totalDuration += chrono::duration<double, nano>(duration);
}
auto x = chrono::duration_cast<chrono::nanoseconds>(totalDuration);
string strDuration = to_string(x.count());
cout << "Average time for " << functionName << ": " << strDuration << " nanoseconds" << endl;
}
return 0;
}
根据 Howard Hinnant 的评论,我发现这是最快的,但我不确定它在实际代码中是否有意义。
将 constArrayMap 声明为 constexpr 似乎有帮助,但我没有在我的 PC 上运行一致的基准测试。
constexpr double constArrayMap[2] = {1.0, -1.0};
constexpr double constexprArray(uint64_t uintFlag)
{
volatile double answer = constArrayMap[uintFlag]; //volatile is only used for benchmarking
return answer;
}
我在运行之间没有获得一致的基准,但这是一个典型的基准
Average time for constexprArray: 311.800000 nanoseconds
Average time for getFromArray: 358.900000 nanoseconds
Average time for getFromConstArray: 322.200000 nanoseconds
Average time for ifBranching: 748.100000 nanoseconds
Average time for mathFormulaCastingtoFloat: 340.500000 nanoseconds
Average time for mathFormulaDelayedCastingtoFloat: 459.600000 nanoseconds
Average time for switchBranching: 820.000000 nanoseconds
我希望得到类似
GOTO uintFlag*space
的东西,或者一些在两个常量之间进行选择的按位技巧。像这样的东西(不起作用)
constexpr double ONE=1.0;
constexpr double MINUS_ONE=-1.0;
double bitwiseChoose(uint64_t uintFlag)
//this code is illustrative, and doesn't works
{ // if uintFlag is 0, MINUS_ONE is erased, if uintFlag is 1, ONE is erased
volatile double answer = (MINUS_ONE & uintFlag) | (ONE & ~uintFlag);
return answer;
}
这是我根据评论使用的代码:
#include <iostream>
#include <chrono>
#include <functional>
#include <sstream>
#include <map>
#include <random>
using namespace std;
typedef double (*conversion_function)(uint64_t);
double ifBranching(uint64_t uintFlag)
{
volatile double answer = uintFlag ? double(-1.0) : double(1.0);
return answer;
}
double switchBranching(uint64_t uintFlag)
{
// volatile double answer;
switch (uintFlag)
{
case 0:
return double(1.0);
case 1:
return double(-1.0);
}
}
double arrayMap[2] = {1.0, -1.0};
double getFromArray(uint64_t uintFlag)
{
volatile double answer = arrayMap[uintFlag];
return answer;
}
constexpr double constArrayMap[2] = {1.0, -1.0};
double getFromConstArray(uint64_t uintFlag)
{
volatile double answer = constArrayMap[uintFlag];
return answer;
}
double mathFormulaCastingtoFloat(uint64_t uintFlag)
{
volatile double answer = double(1.0 - 2.0 * uintFlag);
return answer;
}
double mathFormulaDelayedCastingToFloat(uint64_t uintFlag)
{
volatile double answer = 1 - 2 * uint64_t(uintFlag);
return answer;
}
constexpr double constexprArray(uint64_t uintFlag)
{
volatile double answer = constArrayMap[uintFlag];
return answer;
}
int main()
{
unsigned long long uintFlag[100000];
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(0, 1);
for (int i = 0; i < 100000; i++)
{
uintFlag[i] = dis(gen);
}
map<string, conversion_function> conversion_functions;
conversion_functions["ifBranching"] = ifBranching;
conversion_functions["getFromArray"] = getFromArray;
conversion_functions["getFromConstArray"] = getFromConstArray;
conversion_functions["constexprArray"] = constexprArray;
conversion_functions["switchBranching"] = switchBranching;
conversion_functions["mathFormulaCastingtoFloat"] = mathFormulaCastingtoFloat;
conversion_functions["mathFormulaDelayedCastingtoFloat"] = mathFormulaDelayedCastingToFloat;
const int numRuns = 100000;
for (auto &[functionName, function] : conversion_functions)
{
auto start = chrono::high_resolution_clock::now();
_mm_prefetch(constArrayMap, _MM_HINT_T0);
for (auto flag : uintFlag)
{
volatile double answer = function(flag);
}
auto end = chrono::high_resolution_clock::now();
auto totalDuration = chrono::duration_cast<chrono::nanoseconds>(end - start).count();
auto x = chrono::duration_cast<chrono::nanoseconds>(chrono::seconds(totalDuration));
string strDuration = to_string(x.count() / 1e12);
cout << "Average time for " << functionName << ": " << strDuration << " nanoseconds" << endl;
}
cout << endl;
return 0;
}