考虑下面的一段 Python 代码,它基本上是从 pandas' 用户指南的 Group by: split-apply-combine 章节的 the Transformation 部分 中插入的第一个代码复制的。
import pandas as pd
import numpy as np
speeds = pd.DataFrame(
data = {'class': ['bird', 'bird', 'mammal', 'mammal', 'mammal'],
'order': ['Falconiformes', 'Psittaciformes', 'Carnivora', 'Primates', 'Carnivora'],
'max_speed': [389.0, 24.0, 80.2, np.NaN, 58.0]},
index = ['falcon', 'parrot', 'lion', 'monkey', 'leopard']
)
grouped = speeds.groupby('class')['max_speed']
grouped.diff()
在 Google Colab 中执行时,输出为:
falcon NaN
parrot -365.0
lion NaN
monkey NaN
leopard NaN
Name: max_speed, dtype: float64
这与用户指南中显示的输出相同。
为什么该值对应于
parrot
索引元素 -365.0
而不是像本系列中的其他值那样对应于 NaN
?
输出正确且符合预期。为了清楚起见,以下是其作用的细分:
falcon NaN # NaN since first of the "bird" group
parrot -365.0 # 24 - 389 = -365
lion NaN # NaN since first of the "mammal" group
monkey NaN # NaN - 80.2 = NaN
leopard NaN # 58 - NaN = NaN
Name: max_speed, dtype: float64
如果将输入中的 NaN 替换为有效值(例如 42),您将得到:
alcon NaN # NaN since first of the "bird" group
parrot -365.0 # 24 - 389 = -365
lion NaN # NaN since first of the "mammal"
monkey -38.2 # 42 - 80.2 = -38.2
leopard 16.0 # 58 - 38.2 = 16
Name: max_speed, dtype: float64