根据另一列选择前 n 列

Question

我有一个数据库如下：

我想获得一个根据人口最多的前几行过滤每个日期的 2 行的 pandas 数据框。输出应如下所示：

我知道 pandas 提供了一个称为 nlargest 的公式： https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.nlargest.html

但我认为它不适用于此用例。有什么解决办法吗？

提前非常感谢！

Answer 1

我模仿了您的数据框，如下所示，并提供了一种获得所需结果的方法。

您的数据框：

>>> df
        Date country  population
0 2019-12-31       A         100
1 2019-12-31       B          10
2 2019-12-31       C        1000
3 2020-01-01       A         200
4 2020-01-01       B          20
5 2020-01-01       C        3500
6 2020-01-01       D          12
7 2020-02-01       D        2000
8 2020-02-01       E          54

您想要的解决方案：

您可以将

nlargest

方法与

set_index

和

groupby

方法一起使用。

这就是你会得到的..

>>> df.set_index('country').groupby('Date')['population'].nlargest(2)
Date        country
2019-12-31  C          1000
            A           100
2020-01-01  C          3500
            A           200
2020-02-01  D          2000
            E            54
Name: population, dtype: int64

现在，当您希望通过重置 DataFrame 的索引来使 DataFrame 进入原始状态时，这将为您提供以下内容..

>>> df.set_index('country').groupby('Date')['population'].nlargest(2).reset_index()
        Date country  population
0 2019-12-31       C        1000
1 2019-12-31       A         100
2 2020-01-01       C        3500
3 2020-01-01       A         200
4 2020-02-01       D        2000
5 2020-02-01       E          54

另一种方法：

与

groupby

和

apply

函数一起使用

reset_index

以及参数

drop=True

和

level=

..

>>> df.groupby('Date').apply(lambda p: p.nlargest(2, columns='population')).reset_index(level=[0,1], drop=True)
  # df.groupby('Date').apply(lambda p: p.nlargest(2, columns='population')).reset_index(level=['Date',1], drop=True)
        Date country  population
0 2019-12-31       C        1000
1 2019-12-31       A         100
2 2020-01-01       C        3500
3 2020-01-01       A         200
4 2020-02-01       D        2000
5 2020-02-01       E          54

根据另一列选择前 n 列

问题描述投票：0回答：1

1个回答

您的数据框：

您想要的解决方案：

另一种方法：

最新问题

根据另一列选择前 n 列

问题描述 投票：0回答：1

1个回答

您的数据框：

您想要的解决方案：

另一种方法：

最新问题

问题描述投票：0回答：1