pandas 中的逐元素异或

问题描述 投票:0回答:2

我知道逻辑与是&,逻辑或是|在 Pandas 系列中,但我一直在寻找元素明智的逻辑异或。我想我可以用 AND 和 OR 来表达它,但如果有的话,我更喜欢使用 XOR。

谢谢!

python pandas logic xor
2个回答
22
投票

Python 异或:

a ^ b

Numpy 逻辑异或

np.logical_xor(a,b)

测试性能 - 结果相同:

1。大小为 10000 的随机布尔值序列

In [7]: a = np.random.choice([True, False], size=10000)
In [8]: b = np.random.choice([True, False], size=10000)

In [9]: %timeit a ^ b
The slowest run took 7.61 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop

In [10]: %timeit np.logical_xor(a,b)
The slowest run took 6.25 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop

2。大小为 1000 的随机布尔值序列

In [11]: a = np.random.choice([True, False], size=1000)
In [12]: b = np.random.choice([True, False], size=1000)

In [13]: %timeit a ^ b
The slowest run took 21.52 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop

In [14]: %timeit np.logical_xor(a,b)
The slowest run took 19.45 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop

3.大小为 100

的随机布尔值序列
In [15]: a = np.random.choice([True, False], size=100)
In [16]: b = np.random.choice([True, False], size=100)

In [17]: %timeit a ^ b
The slowest run took 33.43 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 614 ns per loop

In [18]: %timeit np.logical_xor(a,b)
The slowest run took 45.49 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 616 ns per loop

4。大小为 10

的随机布尔序列
In [19]: a = np.random.choice([True, False], size=10)
In [20]: b = np.random.choice([True, False], size=10)

In [21]: %timeit a ^ b
The slowest run took 86.10 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 509 ns per loop

In [22]: %timeit np.logical_xor(a,b)
The slowest run took 40.94 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 511 ns per loop

0
投票

我发现了一种方法,

a^b
np.logical_xor(a,b)
等价的,这确实让我绊倒,但最终是一个简单的修复。希望这可以避免其他人的头痛。

我最近从 Pandas 0.25.3 升级到 2.0.3(numpy 从 1.19.0 升级到 1.24.4),这引发了这个问题。

a
DataFrame
中的
bool
,并且在
Index
上有重复项。 令
b
也是
Series
bool
,其中
b.index == a.columns

我的目的是将

b
广播到
a
,并对
a
b
的每一行进行逐元素异或,其中
a.index
上的任何重复项都应该传递到输出。

此代码适用于我的旧设置...

np.logical_xor(a,b.to_frame().T)

...但我的新设置失败了:

TypeError: '<' not supported between instances of 'Timestamp' and 'int'

我相信,因为有关广播的某些内容试图将

b
b.index
是无意义的
[0]
)连接到
a
(带有时间戳索引),我相信对其进行排序以使其单调。

解决方案是,因为这个OP让我考虑🙏:

a^b

令人恼火/奇妙的是,这似乎也适用于我旧的 pandas/numpy“生产”设置。巧合的是,这是我第一次使用“git Blame”。答案:“初始提交”是 3 年前🤣,所以要么

a^b
在更旧版本的 Pandas 中不起作用,要么我不知道它。

© www.soinside.com 2019 - 2024. All rights reserved.