如何在双精度算术中正确舍入到最接近的整数

问题描述 投票:0回答:1

我必须使用Python3(PyPy实现)分析大量数据,其中我对相当大的浮点数进行一些操作,并且必须检查结果是否足够接近整数。

举个例子,假设我正在生成随机的数字对,并检查它们是否形成毕达哥拉斯三元组(是具有整数边的直角三角形的边):

from math import hypot
from pprint import pprint
from random import randrange
from time import time

def gen_rand_tuples(start, stop, amount):
    '''
    Generates random integer pairs and converts them to tuples of floats.
    '''
    for _ in range(amount):
        yield (float(randrange(start, stop)), float(randrange(start, stop)))

t0 = time()
## Results are those pairs that results in integer hypothenuses, or
## at least very close, to within 1e-12.
results = [t for t in gen_rand_tuples(1, 2**32, 10_000_000) if abs((h := hypot(*t)) - int(h)) < 1e-12]
print('Results found:')
pprint(results)
print('finished in:', round(time() - t0, 2), 'seconds.')

运行它我得到:

Python 3.9.17 (a61d7152b989, Aug 13 2023, 10:27:46)
[PyPy 7.3.12 with GCC 13.2.1 20230728 (Red Hat 13.2.1-1)] on linux
Type "help", "copyright", "credits" or "license()" for more information.
>>> 
===== RESTART: /home/user/Downloads/pythagorean_test_floats.py ====
Results found:
[(2176124225.0, 2742331476.0),
 (342847595.0, 3794647043.0),
 (36.0, 2983807908.0),
 (791324089.0, 2122279232.0)]
finished in: 2.64 seconds.

有趣,它运行速度很快,在 2 秒多一点的时间内处理了 1000 万个数据点,我什至找到了一些匹配的数据。假设显然是整数:

>>> pprint([hypot(*x) for x in results])
[3500842551.0, 3810103759.0, 2983807908.0, 2265008378.0]

但事实并非如此,如果我们使用小数任意精度模块检查结果,我们会发现结果实际上并不够接近整数:

>>> from decimal import Decimal
>>> pprint([(x[0]*x[0] + x[1]*x[1]).sqrt() for x in (tuple(map(Decimal, x)) for x in results)])
[Decimal('3500842551.000000228516418075'),
 Decimal('3810103758.999999710375341513'),
 Decimal('2983807908.000000217172157183'),
 Decimal('2265008377.999999748566051441')]

所以,我认为问题是数字足够大,落在 python 浮点数缺乏精度的范围内,因此返回误报。

现在,我们只需更改程序即可在任何地方使用任意精度的小数:

from decimal import Decimal
from pprint import pprint
from random import randrange
from time import time

def dec_hypot(x, y):
    return (x*x + y*y).sqrt()

def gen_rand_tuples(start, stop, amount):
    '''
    Generates random integer pairs and converts them to tuples of decimals.
    '''
    for _ in range(amount):
        yield (Decimal(randrange(start, stop)), Decimal(randrange(start, stop)))

t0 = time()
## Results are those pairs that results in integer hypothenuses, or
## at least very close, to within 1e-12.
results = [t for t in gen_rand_tuples(1, 2**32, 10_000_000) if abs((h := dec_hypot(*t)) - h.to_integral_value()) < Decimal(1e-12)]
print('Results found:')
pprint(results)
print('finished in:', round(time() - t0, 2), 'seconds.')

现在我们没有收到任何误报,但我们的性能受到了很大的影响。以前需要 2 秒多一点,现在需要 100 多秒。看来小数对 JIT 不友好:

====== RESTART: /home/user/Downloads/pythagorean_test_dec.py ======
Results found:
[]
finished in: 113.82 seconds.

我找到了问题的这个答案CPython 和 PyPy 十进制运算性能,建议使用双双精度数字作为小数的更快、JIT 友好的替代品,以获得比内置浮点数更好的精度。于是我pip安装了doubledouble第三方模块,并相应更改了程序:

from doubledouble import DoubleDouble
from decimal import Decimal
from pprint import pprint
from random import randrange
from time import time

def dd_hypot(x, y):
    return (x*x + y*y).sqrt()

def gen_rand_tuples(start, stop, amount):
    for _ in range(amount):
        yield (DoubleDouble(randrange(start, stop)), DoubleDouble(randrange(start, stop)))

t0 = time()
print('Results found:')
results = [t for t in gen_rand_tuples(1, 2**32, 10_000_000) if abs((h := dd_hypot(*t)) - int(h)) < DoubleDouble(1e-12)]
pprint(results)
print('finished in:', round(time() - t0, 2), 'seconds.')

但我收到此错误:

======= RESTART: /home/user/Downloads/pythagorean_test_dd.py ======
Results found:
Traceback (most recent call last):
  File "/home/user/Downloads/pythagorean_test_dd.py", line 24, in <module>
    results = [t for t in gen_rand_tuples(1, 2**32, 10_000_000) if abs((h := dd_hypot(*t)) - int(h)) < DoubleDouble(1e-12)]
  File "/home/user/Downloads/pythagorean_test_dd.py", line 24, in <listcomp>
    results = [t for t in gen_rand_tuples(1, 2**32, 10_000_000) if abs((h := dd_hypot(*t)) - int(h)) < DoubleDouble(1e-12)]
TypeError: int() argument must be a string, a bytes-like object or a number, not 'DoubleDouble'

我认为问题是模块没有指定转换或舍入到最接近的整数方法。我能写的最好的就是一个极其人为的“int”函数,它通过字符串和小数之间的往返,然后返回到 DoubleDouble,将 double-double 舍入到最接近的整数:

def contrived_int(dd):
    rounded_str = (Decimal(dd.x) + Decimal(dd.y)).to_integral_value()
    hi = float(rounded_str)
    lo = float(Decimal(rounded_str) - Decimal(hi))
    return DoubleDouble(hi, lo)

但是它非常迂回,违背了回避小数的目的,并使程序比全十进制版本更慢。

那么我问,有没有一种快速的方法可以直接将双双精度数舍入到最接近的整数,而不需要经过小数或字符串的中间步骤?

python-3.x precision floating-accuracy pypy double-double-arithmetic
1个回答
0
投票

不是您直接提出的问题的答案,但这里至少有一种方法来检查任何大小的整数是否是完美的平方(我确信有更快的方法,但至少这应该始终有效并且是对数复杂性):

def is_square(n):
    low = 0
    high = 1
    while high * high <= n:
        low = high
        high *= 2
    while low < high:
        mid = (low + high) >> 1
        if mid * mid == n:
            return True
        if mid * mid > n:
            high = mid
        else:
            low = mid + 1
    return False

这只是进行二分查找。

© www.soinside.com 2019 - 2024. All rights reserved.