Python - Pandas - 通过describe()方法计算百分位数的确切公式是什么？

Question

有人可以解释一下如何通过describe()方法计算百分位数吗？

不同的消息来源使用不同的方法解释了这一点。具体的计算方法是什么？

例如，考虑以下代码：

l=[10,13,15,19,21,25]
s=pd.Series(l)
s.describe()

输出是：

count     6.000000
mean     17.166667
std       5.528713
min      10.000000
**25%      13.500000**
50%      17.000000
75%      20.500000
max      25.000000

有人可以解释一下 25%(Q1) 是如何计算的吗？

Answer 1

我找到了解释：https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.quantile.html

df.quantile() 函数有几个参数可用于计算不同插值中的分位数。对于 df.describe() 使用的默认值似乎是线性的，如下所示：

s.quantile([.25, .5, .75], interpolation="linear")

给出：

0.25    13.5
0.50    17.0
0.75    20.5

您还可以使用：

s.quantile([.25, .5, .75], interpolation="nearest")

获得：

0.25    13
0.50    15
0.75    21

最近的也是我期待的

Answer 2

这在

quantile

文档中进行了描述（默认情况下插值是线性的）：

线性：i + (j - i) * (x-i)/(j-i)

其中 (x-i)/(j-i) 是由 i > j 包围的索引的小数部分。

Python - Pandas - 通过describe()方法计算百分位数的确切公式是什么？

问题描述投票：0回答：2

2个回答

最新问题

Python - Pandas - 通过describe()方法计算百分位数的确切公式是什么？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2