pandas 的 DataFrameGroupBy.diff 函数出现意外输出

问题描述 投票:0回答:1

考虑下面的一段 Python 代码,它基本上是从 pandas' 用户指南的 Group by: split-apply-combine 章节的 the Transformation 部分 中插入的第一个代码复制的。

import pandas as pd
import numpy as np

speeds = pd.DataFrame(
    data = {'class': ['bird', 'bird', 'mammal', 'mammal', 'mammal'],
            'order': ['Falconiformes', 'Psittaciformes', 'Carnivora', 'Primates', 'Carnivora'],
            'max_speed': [389.0, 24.0, 80.2, np.NaN, 58.0]},
    index = ['falcon', 'parrot', 'lion', 'monkey', 'leopard']
)

grouped = speeds.groupby('class')['max_speed']
grouped.diff()

在 Google Colab 中执行时,输出为:

falcon       NaN
parrot    -365.0
lion         NaN
monkey       NaN
leopard      NaN
Name: max_speed, dtype: float64

这与用户指南中显示的输出相同。

为什么该值对应于

parrot
索引元素
-365.0
而不是像本系列中的其他值那样对应于
NaN

python pandas
1个回答
1
投票

输出正确且符合预期。为了清楚起见,以下是其作用的细分:

falcon       NaN                 # NaN since first of the "bird" group
parrot    -365.0                 # 24 - 389   = -365
lion         NaN                 # NaN since first of the "mammal" group
monkey       NaN                 # NaN - 80.2 = NaN
leopard      NaN                 # 58 - NaN   = NaN
Name: max_speed, dtype: float64

如果将输入中的 NaN 替换为有效值(例如 42),您将得到:

alcon       NaN                 # NaN since first of the "bird" group
parrot    -365.0                 # 24 - 389   = -365
lion         NaN                 # NaN since first of the "mammal" 
monkey     -38.2                 # 42 - 80.2 = -38.2
leopard     16.0                 # 58 - 38.2 = 16
Name: max_speed, dtype: float64
© www.soinside.com 2019 - 2024. All rights reserved.