为什么对大整数的操作会悄悄溢出？

Question

我有一个包含非常大的整数的列表，我想将其转换为具有特定数据类型的 pandas 列。例如，如果列表包含

2**31

，它超出了 int32 dtype 的限制，则将其转换为 dtype int32 会引发溢出错误，这让我知道要使用另一个 dtype 或提前以其他方式处理该数字。

但是，如果一个数字很大，但在 dtype 限制之内（即

2**31-1

），并且向其中添加了一些数字，导致值超出了 dtype 限制，则操作不是引发溢出错误，而是执行时没有任何错误，但该值现在已反转并且是一个完全错误的数字。

import pandas as pd
pd.Series([2**31], dtype='int32')        # <--- OverflowError: Python int too large to convert to C long

pd.Series([2**31-1], dtype='int32') + 1

0   -2147483648
dtype: int32

为什么会这样？为什么它不会像第一种情况那样引发错误？

PS。我在 Python 3.11.5 上使用 pandas 2.1.0。

Answer 1

让我们来了解一下：

import pandas as pd
s = pd.Series([2**31-1], dtype='int32')
type(s[0])
type((pd.Series([2**31-1], dtype='int32') + 1)[0])
type(s[0] + 1)
pd.Series([1,2,3], dtype='int32') + 1

<class 'numpy.int32'>
<class 'numpy.int32'>
<class 'numpy.int64'>
0    2
1    3
2    4
dtype: int32

Pandas 对

Series

执行加法运算并强制执行类型。 Numpy 在访问

Series

（类型：

numpy.int32

）的元素并执行加法时接管。 Numpy 将类型强制为

numpy.int64

以避免溢出。

Answer 2

不是 100% 确定，但作为有根据的猜测：

第一次溢出发生在Python和C之间的边界，并且在转换过程中检测到溢出。然而，第二次溢出完全发生在 C 内部，其中不存在整数溢出检查。

为什么对大整数的操作会悄悄溢出？

问题描述投票：0回答：2

2个回答

最新问题

为什么对大整数的操作会悄悄溢出？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2