为什么 pandas '==' 与 '.eq()' 不同

Question

考虑这个系列

s = pd.Series([(1, 2), (3, 4), (5, 6)])

这正如预期的那样

s == (3, 4)

0    False
1     True
2    False
dtype: bool

这不是

s.eq((3, 4))

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

ValueError: Lengths must be equal

我假设它们是相同的。他们有什么区别？

文档说什么？

相当于系列==其他，但支持用 fill_value 替换输入之一中缺失的数据。

这似乎意味着它们应该工作相同，因此造成混乱。

Answer 1

您遇到的实际上是一种特殊情况，可以更轻松地将

pandas.Series

或

numpy.ndarray

与普通的 python 结构进行比较。源代码如下：

def flex_wrapper(self, other, level=None, fill_value=None, axis=0):
    # validate axis
    if axis is not None:
        self._get_axis_number(axis)
    if isinstance(other, ABCSeries):
        return self._binop(other, op, level=level, fill_value=fill_value)
    elif isinstance(other, (np.ndarray, list, tuple)):
        if len(other) != len(self):
            # ---------------------------------------
            # you never reach the `==` path because you get into this.
            # ---------------------------------------
            raise ValueError('Lengths must be equal')  
        return self._binop(self._constructor(other, self.index), op,
                           level=level, fill_value=fill_value)
    else:
        if fill_value is not None:
            self = self.fillna(fill_value)

        return self._constructor(op(self, other),
                                 self.index).__finalize__(self)

您点击了

ValueError

，因为 pandas 假设您希望将

.eq

的值转换为

numpy.ndarray

或

pandas.Series

（如果您给它一个数组、列表或元组），而不是实际比较它到

tuple

。例如，如果您有：

s = pd.Series([1,2,3])
s.eq([1,2,3])

您不希望它将每个元素与

[1,2,3]

进行比较。

问题在于

object

数组（与

dtype=uint

一样）经常会漏掉裂缝或被故意忽略。该方法中的一个简单的

if self.dtype != 'object'

分支可以解决这个问题。但也许开发商有充分的理由让这种情况有所不同。我建议通过在他们的 bug tracker 上发帖来要求澄清。

您还没有问如何使其正确工作，但为了完整起见，我将包括一种可能性（根据源代码，您似乎需要自己将其包装为

pandas.Series

）：

>>> s.eq(pd.Series([(1, 2)]))
0     True
1    False
2    False
dtype: bool

Answer 2

==

是逐元素比较，产生真值向量，而

.eq

是“这两个可迭代对象相等”，要求长度相同。 Ayhan 指出了一个例外：当您使用

.eq(scalar value)

比较 pandas 向量类型时，标量值只是广播到相同大小的向量进行比较。

Answer 3

简而言之

==

和

Series.eq()

根本不等价，与文档所述相反：

```
==
```
用于比较具有相同值的各个行，这里是一个元组，导致您在系列中的示例
```
[False, True, False]
```
。
```
.eq
```
用于比较具有不同值的各个行，这里是元组，使用适当的元组，在系列中
```
[True, True, True]
```
。

当系列元素是标量时，这两个函数的行为确实相同（并且文档是正确的），但通常情况并非如此。

详情

```
==
```
是一个比较两个值的Python运算符，依赖于相应的
```
Series.__eq__()
```
进行比较。
```
Series.eq()
```
是 Pandas 函数。

因此这可以归结为：

Series.__eq__(other)

和

Series.eq(other)

是否具有等价功能？尽管

Series.__eq__(other)

会在系列的每一行上触发单独的

==

，但它们并非如此。

实验

行为差异可以这样看：

s = pd.Series([(1, 2), (3, 4), (5, 6)])
print(s == (3,4)) # <-- row == single tuple (3,4)
print(s == ((1,2),(3,4),(5,6))) # <-- row == single tuple of tuples
print(s.eq((3,4))) # <-- not valid, 2 values provided, expecting 3
print(s.eq(((1,2),(3,4),(5,6)))) # <-- each row == each tuple

第二个

==

无意义，第三个结果：

ValueError: Lengths must be equal

，其他有意义：

other = (3,4)
s == other # <-- legitimate
0    False
1    True
2    False
dtype: bool

other = ((1,2),(3,4),(5,6))
s == other # <-- meaningless
0    False
1    False
2    False
dtype: bool

s.eq(other) # <-- legitimate
0    True
1    True
2    True
dtype: bool

这给我们留下了两条合法的指令，但它们的目的并不相同。

具体案例

当级数元素为标量时，

other

中的

Series.eq(other)

也是标量。它广播到系列的长度，并且每一行实际上都与相同的标量进行比较。在这种情况下，两次比较得出相同的结果：

s = pd.Series([1,3,5])
print(s == 3) # <-- scalar
print(s.eq(3)) # <-- scalar broadcasts

s == 3
0    False
1    True
2    False
dtype: bool

s.eq(3)
0    False
1    True
2    False
dtype: bool

当元素不是标量时，这种明显的相似性并不成立。

为什么 pandas '==' 与 '.eq()' 不同

问题描述投票：0回答：3

3个回答

最新问题

为什么 pandas '==' 与 '.eq()' 不同

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3