C++ 基本浮点解析代码与 std::atof 的结果略有不同

Question

#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <vector>
#include <iomanip>
#include <cmath>
using namespace std;

#define LIKELY(expr) (__builtin_expect(!!(expr), 1))
#define UNLIKELY(expr) (__builtin_expect(!!(expr), 0))

bool abscmp(double a, double b)
{
  if (isnan(a) && isnan(b)) return true;
  if (isnan(a) ^ isnan(b)) return false;
  return a == b;
}

template <typename T>
inline __attribute__((always_inline)) T ParseFloat(const char *a) {
  static constexpr T multers[] = {
    0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001, 0.0000001, 0.00000001, 0.000000001, 0.0000000001, 0.00000000001,
    0.000000000001, 0.0000000000001, 0.00000000000001, 0.000000000000001, 0.0000000000000001, 0.00000000000000001
  };
  static_assert(std::is_floating_point_v<T>);
  int i = (a[0] == '-') | (a[0] == '+');
  T res = 0.0;
  int sign = 1 - 2 * (a[0] == '-');
  if (UNLIKELY(!a[0])) return NAN;

  while (a[i] && a[i] != '.') {
    if (UNLIKELY(a[i] < '0' || a[i] > '9')) {
      return NAN;
    }
    res = res * static_cast<T>(10.0) + a[i] - '0';
    i++;
  }

  if (LIKELY(a[i] != '\0')) {
    i++;
    int j = i;
    //T mult = 0.1;
    while (a[i]) {
      if (UNLIKELY(a[i] < '0' || a[i] > '9')) {
        return NAN;
      }
      res = res + (a[i] - '0') * multers[i - j];
      // res = res + (a[i] - '0') * mult;
      // mult *= 0.1;
      i++;
    }
  }

  return res * sign;
}

int main()
{
  string inputs[] = {
    "31.0911863667",
    "30.9500",
    "225.1293333333",
    "16.4850",
    "29.0507297346",
    "147.9440517474",
    "28.8500",
    "213.4600",
    "212.9105553333",
    "199.1553333333",
    "19.5884123000",
    "3092458.37500000000"
  };

  int n = sizeof(inputs) / sizeof(inputs[0]);
  for (int i = 0; i < n; i++) {
    float res1 = std::atof(inputs[i].c_str());
    float res2 = ParseFloat<double>(inputs[i].c_str());
    if (!abscmp(res1, res2)) {
      cout << std::fixed << std::setprecision(20) << "CompareConvert " << res1 << " " << res2 << " " << std::string(inputs[i]) << std::endl;
    } else {
      cout << std::fixed << std::setprecision(20) << "Correct " << res1 << std::endl;
    }
  }
}

我正在编写一个简单的解析器（具有完整的有效性检查），因为

std::atof

太慢（

ParseFast

在我的测试输入中平均快 3.2 倍 - 解析 GB 的 CSV 文件）。公式很简单，

res = res * 10 + (a[i] - '0');

。但它给出的结果略有不同。

我知道这是由于 IEEE-754 浮点的限制。但是有没有什么便宜的方法可以让

ParseFast

给出与

std::atof

完全相同的结果呢？我需要它们完全相同，因为它与使用 sha256sum 来检查相等性而不是

fabs(a - b) < epsilon

的遗留模块进行交互

运行命令：

g++ -o main main.cpp -O3 -std=c++17

，gcc 10.2.0

编辑：解释为什么最后的输入是错误的：小数部分

0.375

恰好位于2个可能的值

0.5

和

0.25

之间。但用这种解析方法，在

.3

这个数字处，由于四舍五入，中间结果将是

3092458.0 + 0.3 == 3092458.25

。添加

0.075

仍然会得到

3092458.25

。

Answer 1

我建议使用 fast_float 库（https://github.com/fastfloat/fast_float）。据称它比

strtod

快 4 到 10 倍（glibc 的

atod

只是

strtod

的简单包装）。

Answer 2

正如您所正确注意到的，问题在于中间结果已经不精确，而且这会增加。例如，中间存储

.2

已经不精确了，即使它后面可能跟着

，这将使其成为

.25

，并且这将是完全可表示的。

需要将浮点数的小数部分累加为整数（仍然是浮点数，但没有小数部分），然后最后除一次以调整指数：

  // ...
  if (LIKELY(a[i] != '\0')) {
    T fraction = 0; // fractional part as an integer
    T power = 1;    // turns to 1, 10, 100, 1000, ... each loop iteration
    i++;
    while (a[i]) {
      if (UNLIKELY(a[i] < '0' || a[i] > '9')) {
        return NAN;
      }
      // note: additional logic is required to make sure that trailing
      //       zeros in the fraction cannot decrease the precision
      power *= T{10};
      fraction = fraction * T{10} + (a[i] - '0');
      i++;
    }

    // note: 'power' can also be obtained from a look-up table like in
    //       your original code.
    //       Benchmark to make sure that it's actually faster to use a table.
    res += fraction / power; // perform one division in the end, e.g. 375 / 100
  }
  
  return std::copysign(res, sign); // note: prefer copysign over multiplication
}

参见编译器资源管理器上的实时示例

即使进行了这些更改，正如 @cpplearner 所建议的那样，使用第三方库可能会更好。

std::strtof

或

std::to_chars

等标准库函数可能无法提供最佳性能，并且不同标准库的性能会有所不同。

虽然您的解决方案在某些平台上可能会更快，但在浮点乘法和除法更昂贵的平台上，它的性能可能会更差。浮点数很难。

C++ 基本浮点解析代码与 std::atof 的结果略有不同

问题描述投票：0回答：2

2个回答

最新问题

C++ 基本浮点解析代码与 std::atof 的结果略有不同

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2