根据分组聚合函数中 pandas 数据帧另一列的最小最大值获取列的值

问题描述 投票:0回答:1

pandas 数据框:

data = pd.DataFrame ({
    'group': ['A', 'A', 'B', 'B', 'C', 'C'],
    'date': ['2023-01-15', '2023-02-20', '2023-01-10', '2023-03-05', '2023-02-01', '2023-04-10'],
    'value': [10, 15, 5, 25, 8, 12]} )

尝试根据聚合函数中每个“组”的“日期”列的最小值和最大值获取“值”列的值:

## the following doesn't work
    output = (
      df
      .groupby(['group'],as_index=False).agg(
          ## there are some other additional aggregate functions happening here too.

          value_at_min = ('value' , lambda x: x.loc[x['date'].idxmin()])
        , value_at_max = ('value' , lambda x: x.loc[x['date'].idxmax()])
       ))

即使将日期转换为日期时间,这也不起作用(事实上,我的原始日期列是日期时间格式)。

期望的输出应该是:

    group   min_date    max_date    value_at_min    value_at_max
0   A       2023-01-15  2023-02-20      10              15
1   B       2023-01-10  2023-03-05      5               25
2   C       2023-02-01  2023-04-10      8               12
python pandas indexing group-by aggregate
1个回答
0
投票

我宁愿获取 idxmin/max,然后对原始 DataFrame 进行切片:

tmp = data.groupby('group')['value'].agg(['idxmin', 'idxmax'])

out = (data.loc[tmp['idxmin']]
           .merge(data.loc[tmp['idxmax']],
                  on='group', suffixes=('_min', '_max'))
      )

输出:

  group    date_min  value_min    date_max  value_max
0     A  2023-01-15         10  2023-02-20         15
1     B  2023-01-10          5  2023-03-05         25
2     C  2023-02-01          8  2023-04-10         12
© www.soinside.com 2019 - 2024. All rights reserved.