舍入到IEEE 754精度但保持二进制格式

Question

如果我将十进制数3120.0005转换为浮点（32位）表示，则数字向下舍入为3120.00048828125。

假设我们使用的是一个比例为10 ^ 12的固定点数，那么1000000000000 = 1.0和3120000500000000 = 3120.0005。

公式/算法将向下舍入到最接近的IEEE 754精度以获得3120000488281250？我还需要一种方法来获得四舍五入的结果（3120000732421875）。

Answer 1

如果除以小数比例因子，您将找到最近的可表示浮点数。为了绕另一个方向，可以使用std::nextafter：

#include <float.h>
#include <math.h>
#include <stdio.h>

long long scale_to_fixed(float f)
{
    float intf = truncf(f);
    long long result = 1000000000000LL;
    result *= (long long)intf;
    result += round((f - intf) * 1.0e12);
    return result;
}

/* not needed, always good enough to use (float)(n / 1.0e12) */
float scale_from_fixed(long long n)
{
    float result = (n % 1000000000000LL) / 1.0e12;
    result += n / 1000000000000LL;
    return result;
}

int main()
{
    long long x = 3120000500000000;
    float x_reduced = scale_from_fixed(x);
    long long y1 = scale_to_fixed(x_reduced);
    long long yfloor = y1, yceil = y1;
    if (y1 < x) {
        yceil = scale_to_fixed(nextafterf(x_reduced, FLT_MAX));
    }
    else if (y1 > x) {
        yfloor = scale_to_fixed(nextafterf(x_reduced, -FLT_MAX));
    }

    printf("%lld\n%lld\n%lld\n", yfloor, x, yceil);
}

结果：

3120000488281250

3120000500000000

3120000732421875

Answer 2

为了处理由float缩放的1e12值并计算下一个更大的2的幂，例如， "rounding up (3120000732421875)"，关键是要理解你正在寻找x / 1.0e12的32位表示中的下一个更大的2的幂。虽然你可以在数学上得到这个值，但union和float（或unsigned）之间的uint32_t提供了一种直接的方法来将浮点数的存储32位值解释为无符号值.1

利用联合prev来保持x的减少值和保持无符号值（next）的单独实例+1的简单示例可以是：

#include <stdio.h>
#include <inttypes.h>

int main (void) {

    uint64_t x = 3120000500000000;
    union {                         /* union between float and uint32_t */
        float f;
        uint32_t u;
    } prev = { .f = x / 1.0e12 },   /* x reduced to float, pwr of 2 as .u */
      next = { .u = prev.u + 1u };  /* 2nd union, increment pwr of 2 by 1 */

    printf ("prev : %" PRIu64 "\n   x : %" PRIu64 "\nnext : %" PRIu64 "\n", 
            (uint64_t)(prev.f * 1e12), x, (uint64_t)(next.f * 1e12));
}

示例使用/输出

$ ./bin/pwr2_prev_next
prev : 3120000488281250
   x : 3120000500000000
next : 3120000732421875

脚注：

1.作为替代方案，您可以使用指向char的指针来保存浮点类型的地址，并将存储在该位置的4字节值解释为unsigned而不会与C11 Standard - §6.5 Expressions (p6,7)（“严格别名规则”）发生冲突，但是使用union是首选。

舍入到IEEE 754精度但保持二进制格式

问题描述投票：0回答：2

2个回答

最新问题

舍入到IEEE 754精度但保持二进制格式

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2