我知道逻辑与是&,逻辑或是|在 Pandas 系列中,但我一直在寻找元素明智的逻辑异或。我想我可以用 AND 和 OR 来表达它,但如果有的话,我更喜欢使用 XOR。
谢谢!
Python 异或:
a ^ b
np.logical_xor(a,b)
测试性能 - 结果相同:
1。大小为 10000 的随机布尔值序列
In [7]: a = np.random.choice([True, False], size=10000)
In [8]: b = np.random.choice([True, False], size=10000)
In [9]: %timeit a ^ b
The slowest run took 7.61 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop
In [10]: %timeit np.logical_xor(a,b)
The slowest run took 6.25 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop
2。大小为 1000 的随机布尔值序列
In [11]: a = np.random.choice([True, False], size=1000)
In [12]: b = np.random.choice([True, False], size=1000)
In [13]: %timeit a ^ b
The slowest run took 21.52 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop
In [14]: %timeit np.logical_xor(a,b)
The slowest run took 19.45 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop
3.大小为 100
的随机布尔值序列In [15]: a = np.random.choice([True, False], size=100)
In [16]: b = np.random.choice([True, False], size=100)
In [17]: %timeit a ^ b
The slowest run took 33.43 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 614 ns per loop
In [18]: %timeit np.logical_xor(a,b)
The slowest run took 45.49 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 616 ns per loop
4。大小为 10
的随机布尔序列In [19]: a = np.random.choice([True, False], size=10)
In [20]: b = np.random.choice([True, False], size=10)
In [21]: %timeit a ^ b
The slowest run took 86.10 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 509 ns per loop
In [22]: %timeit np.logical_xor(a,b)
The slowest run took 40.94 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 511 ns per loop
我发现了一种方法,
a^b
和np.logical_xor(a,b)
是不等价的,这确实让我绊倒,但最终是一个简单的修复。希望这可以避免其他人的头痛。
我最近从 Pandas 0.25.3 升级到 2.0.3(numpy 从 1.19.0 升级到 1.24.4),这引发了这个问题。
令
a
为 DataFrame
中的 bool
,并且在 Index
上有重复项。
令 b
也是 Series
的 bool
,其中 b.index == a.columns
。
我的目的是将
b
广播到 a
,并对 a
和 b
的每一行进行逐元素异或,其中 a.index
上的任何重复项都应该传递到输出。
此代码适用于我的旧设置...
np.logical_xor(a,b.to_frame().T)
...但我的新设置失败了:
TypeError: '<' not supported between instances of 'Timestamp' and 'int'
我相信,因为有关广播的某些内容试图将
b
(b.index
是无意义的[0]
)连接到a
(带有时间戳索引),我相信对其进行排序以使其单调。
解决方案是,因为这个OP让我考虑🙏:
a^b
令人恼火/奇妙的是,这似乎也适用于我旧的 pandas/numpy“生产”设置。巧合的是,这是我第一次使用“git Blame”。答案:“初始提交”是 3 年前🤣,所以要么
a^b
在更旧版本的 Pandas 中不起作用,要么我不知道它。