使用大熊猫查找数据框各列的前2

Question

未定义的行到列，按年份和列分组从此数据框中：

year  month  count  reason 
2001      1      1       a
2001      2      3       b
2001      3      4       c
2005      1      4       a
2005      1      3       c

使用此代码：

df2 = pd.pivot_table(df,index=["year","month",],values=["count"],columns="reason").reset_index().fillna(0)
df2.columns = [i[0] if i[0]!="count" else f"reason_{i[1]}" for i in df2.columns]
df2["count"] = df2.iloc[:,2:5].sum(axis=1)
print (df2)

改变数据框的结构变得像这样：

year  month  reason_a  reason_b  reason_c  count
2001    1        1         0         0        1
2001    2        0         3         0        3
2001    3        0         0         4        4 
2005    1        4         0         3        7

然后，接下来我要选择原因栏？仅具有前2个较高的值：

find_top_two = [df2.iloc[:,2:-1].sum().nlargest(2)]
find_top_two

输出变成这样：

[reason_c    7.0
 reason_a    5.0
 dtype: float64]

但是，我想要的预期输出是数据帧应该是这样的：

year   month   reason_a  reason_c  
2001       1        1         0
2001       2        0         0
2001       3        0         4
2005       1        4         3

有人可以帮我解决这个问题吗？任何帮助，将不胜感激。预先谢谢你。

Answer 1

使用nlargest和T和join

df3 = df2[['year', 'month']].join(df2.iloc[:,2:-1].T.nlargest(2, df2.index).T)

Out[28]:
   year  month  reason_a  reason_b
0  2001      1         1         0
1  2001      2         0         3
2  2001      3         0         0
3  2005      1         4         0

使用大熊猫查找数据框各列的前2

问题描述投票：0回答：1

1个回答

最新问题

使用大熊猫查找数据框各列的前2

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1