计算 Pandas GroupBy 上的任意百分位数

Question

目前 Pandas 的

median

对象上有一个

GroupBy

方法。

有没有办法计算分组上的任意

percentile

（参见：http://docs.scipy.org/doc/numpy-dev/reference/ generated/numpy.percentile.html）？

中位数将是用

q=50

计算的百分位数。

Answer 1

您想要

quantile

方法：

In [47]: df
Out[47]: 
           A         B    C
0   0.719391  0.091693  one
1   0.951499  0.837160  one
2   0.975212  0.224855  one
3   0.807620  0.031284  one
4   0.633190  0.342889  one
5   0.075102  0.899291  one
6   0.502843  0.773424  one
7   0.032285  0.242476  one
8   0.794938  0.607745  one
9   0.620387  0.574222  one
10  0.446639  0.549749  two
11  0.664324  0.134041  two
12  0.622217  0.505057  two
13  0.670338  0.990870  two
14  0.281431  0.016245  two
15  0.675756  0.185967  two
16  0.145147  0.045686  two
17  0.404413  0.191482  two
18  0.949130  0.943509  two
19  0.164642  0.157013  two

In [48]: df.groupby('C').quantile(.95)
Out[48]: 
            A         B
C                      
one  0.964541  0.871332
two  0.826112  0.969558

Answer 2

我找到了另一个有用的解决方案这里

如果我必须使用

groupby

，另一种方法可以是：

def percentile(n):
    def percentile_(x):
        return np.percentile(x, n)
    percentile_.__name__ = 'percentile_%s' % n
    return percentile_

使用下面的调用，我能够获得与@TomAugspurger给出的解决方案相同的结果

df.groupby('C').agg([percentile(50), percentile(95)])

Answer 3

通过

pandas >= 0.25.0

，您还可以使用命名聚合

一个例子是

import numpy
import pandas as pd
df = pd.DataFrame({'A': numpy.random.randint(1,3,size=100),'C': numpy.random.randn(100)})
df.groupby('A').agg(min_val = ('C','min'), percentile_80 = ('C',lambda x: x.quantile(0.8)))

Answer 4

基于我原来的答案，使用pypi/pandas-wizard。现在您可以简单地：

import pandaswizard as pdw # attempt to create an ubiquitous naming
column.agg([np.sum, np.mean, pdw.percentile(50), pdw.quantile(0.95)])

请注意，该模块使用内部函数

quantile

模拟

percentile

和

pd.Series.quantile()

，并且允许使用

interpolation

（或

method

中的

numpy

名称）等属性。创建一个包装器以允许基于任何定义的

numpy

特定方法进行计算。

计算 Pandas GroupBy 上的任意百分位数

问题描述投票：0回答：4

4个回答

最新问题

计算 Pandas GroupBy 上的任意百分位数

问题描述 投票：0回答：4

4个回答

最新问题

问题描述投票：0回答：4