使用pandas groupby时保留原始索引

Question

我有以下数据框，希望按年份分组并返回最大值（但将索引值保留在此处）：

import pandas as pd

dct = {
"date": ["2019-01-01", "2019-04-1", "2020-01-01"],
"high": [100, 150, 100],
}
df = pd.DataFrame(dct)
df.set_index("date",inplace=True)
df.index = [pd.Timestamp(i) for i in df.index]
df.index.name = "date" 

# date         high
# 2019-01-01   100
# 2019-04-01   150
# 2020-01-01   100

使用pandas groupby时，我无法按年份对它们进行分组，但没有得到我想要的日期：

func = lambda x: x.year
df["high"].groupby(func).max()

# date    high
# 2019    150
# 2020    100

我想要的输出是使用pandas groupby并获取：

 # NOTE : the date index is like the original

 # date         high
 # 2019-04-01   150
 # 2020-01-01   100

Answer 1

另一种方法是使用idxmax和loc访问：

df.loc[df.groupby(df.index.year).high.idxmax()]

输出：

            high
date            
2019-04-01   150
2020-01-01   100

Answer 2

[sort_values然后用groupby做tail

df.sort_values('high').groupby(df.index.year).tail(1)
            high
date            
2020-01-01   100
2019-04-01   150

[当您执行df["high"].groupby(func).max()时，它不是series groupby而不是数据帧groupby，因此输出不会继承数据帧索引

使用pandas groupby时保留原始索引

问题描述投票：0回答：2

2个回答

最新问题

使用pandas groupby时保留原始索引

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2