pandas 的 DataFrameGroupBy.diff 函数出现意外输出

Question

考虑下面的一段 Python 代码，它基本上是从 pandas' 用户指南的 Group by: split-apply-combine 章节的 the Transformation 部分中插入的第一个代码复制的。

import pandas as pd
import numpy as np

speeds = pd.DataFrame(
    data = {'class': ['bird', 'bird', 'mammal', 'mammal', 'mammal'],
            'order': ['Falconiformes', 'Psittaciformes', 'Carnivora', 'Primates', 'Carnivora'],
            'max_speed': [389.0, 24.0, 80.2, np.NaN, 58.0]},
    index = ['falcon', 'parrot', 'lion', 'monkey', 'leopard']
)

grouped = speeds.groupby('class')['max_speed']
grouped.diff()

在 Google Colab 中执行时，输出为：

falcon       NaN
parrot    -365.0
lion         NaN
monkey       NaN
leopard      NaN
Name: max_speed, dtype: float64

这与用户指南中显示的输出相同。

为什么该值对应于

parrot

索引元素

-365.0

而不是像本系列中的其他值那样对应于

NaN

？

Answer 1

输出正确且符合预期。为了清楚起见，以下是其作用的细分：

falcon       NaN                 # NaN since first of the "bird" group
parrot    -365.0                 # 24 - 389   = -365
lion         NaN                 # NaN since first of the "mammal" group
monkey       NaN                 # NaN - 80.2 = NaN
leopard      NaN                 # 58 - NaN   = NaN
Name: max_speed, dtype: float64

如果将输入中的 NaN 替换为有效值（例如 42），您将得到：

alcon       NaN                 # NaN since first of the "bird" group
parrot    -365.0                 # 24 - 389   = -365
lion         NaN                 # NaN since first of the "mammal" 
monkey     -38.2                 # 42 - 80.2 = -38.2
leopard     16.0                 # 58 - 38.2 = 16
Name: max_speed, dtype: float64

pandas 的 DataFrameGroupBy.diff 函数出现意外输出

问题描述投票：0回答：1

1个回答

最新问题

pandas 的 DataFrameGroupBy.diff 函数出现意外输出

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1