pandas 中的逐元素异或

Question

我知道逻辑与是&，逻辑或是|在 Pandas 系列中，但我一直在寻找元素明智的逻辑异或。我想我可以用 AND 和 OR 来表达它，但如果有的话，我更喜欢使用 XOR。

谢谢！

Answer 1

Python 异或：

a ^ b

Numpy 逻辑异或：

np.logical_xor(a,b)

测试性能 - 结果相同：

1。大小为 10000 的随机布尔值序列

In [7]: a = np.random.choice([True, False], size=10000)
In [8]: b = np.random.choice([True, False], size=10000)

In [9]: %timeit a ^ b
The slowest run took 7.61 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop

In [10]: %timeit np.logical_xor(a,b)
The slowest run took 6.25 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop

2。大小为 1000 的随机布尔值序列

In [11]: a = np.random.choice([True, False], size=1000)
In [12]: b = np.random.choice([True, False], size=1000)

In [13]: %timeit a ^ b
The slowest run took 21.52 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop

In [14]: %timeit np.logical_xor(a,b)
The slowest run took 19.45 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop

3.大小为 100

的随机布尔值序列

In [15]: a = np.random.choice([True, False], size=100)
In [16]: b = np.random.choice([True, False], size=100)

In [17]: %timeit a ^ b
The slowest run took 33.43 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 614 ns per loop

In [18]: %timeit np.logical_xor(a,b)
The slowest run took 45.49 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 616 ns per loop

4。大小为 10

的随机布尔序列

In [19]: a = np.random.choice([True, False], size=10)
In [20]: b = np.random.choice([True, False], size=10)

In [21]: %timeit a ^ b
The slowest run took 86.10 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 509 ns per loop

In [22]: %timeit np.logical_xor(a,b)
The slowest run took 40.94 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 511 ns per loop

Answer 2

我发现了一种方法，

a^b

和

np.logical_xor(a,b)

是不等价的，这确实让我绊倒，但最终是一个简单的修复。希望这可以避免其他人的头痛。

我最近从 Pandas 0.25.3 升级到 2.0.3（numpy 从 1.19.0 升级到 1.24.4），这引发了这个问题。

令

为

DataFrame

中的

bool

，并且在

Index

上有重复项。令

也是

Series

的

bool

，其中

b.index == a.columns

。

我的目的是将

广播到

，并对

和

的每一行进行逐元素异或，其中

a.index

上的任何重复项都应该传递到输出。

此代码适用于我的旧设置...

np.logical_xor(a,b.to_frame().T)

...但我的新设置失败了：

TypeError: '<' not supported between instances of 'Timestamp' and 'int'

我相信，因为有关广播的某些内容试图将

（

b.index

是无意义的

[0]

）连接到

（带有时间戳索引），我相信对其进行排序以使其单调。

解决方案是，因为这个OP让我考虑🙏：

a^b

令人恼火/奇妙的是，这似乎也适用于我旧的 pandas/numpy“生产”设置。巧合的是，这是我第一次使用“git Blame”。答案：“初始提交”是 3 年前🤣，所以要么

a^b

在更旧版本的 Pandas 中不起作用，要么我不知道它。

pandas 中的逐元素异或

问题描述投票：0回答：2

2个回答

最新问题

pandas 中的逐元素异或

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2