R Studio-如何获取基准所在的百分位数

问题描述 投票:1回答:1

我有下面的数据框。我想找出每个“基准”所在的“值”的百分位数。例如,“基准”为100大约是“值”的第75个百分点。

enter image description here

group <- c(1,1,1,2,2,2)
benchmark <- c(100,100,100,200,200,200)
value <- c(50,80,120,150,230,250)
d_f <- data.frame(group,benchmark, value)

d_f %>%
  group_by(group, benchmark) %>%
  summarise(q25 = quantile(value, 0.25),
            q50 = quantile(value, 0.50),
            q75 = quantile(value, 0.75)
            # can add more percentile
            )

谢谢!

r dplyr percentile
1个回答
0
投票

我认为您需要ecdf。剩下的问题(对我而言)是您的经验累积分布是按组还是整体进行。

每组:

d_f %>%
  group_by(group, benchmark) %>%
  mutate(bench_pctile = ecdf(value)(benchmark) * 100)
# # A tibble: 6 x 4
# # Groups:   group, benchmark [2]
#   group benchmark value bench_pctile
#   <dbl>     <dbl> <dbl>        <dbl>
# 1     1       100    50         66.7
# 2     1       100    80         66.7
# 3     1       100   120         66.7
# 4     2       200   150         33.3
# 5     2       200   230         33.3
# 6     2       200   250         33.3

或者在整列中,我们需要在分组之前调用ecdf

valecdf <- ecdf(d_f$value)
d_f %>%
  group_by(group, benchmark) %>%
  mutate(bench_pctile = valecdf(benchmark) * 100)
# # A tibble: 6 x 4
# # Groups:   group, benchmark [2]
#   group benchmark value bench_pctile
#   <dbl>     <dbl> <dbl>        <dbl>
# 1     1       100    50         33.3
# 2     1       100    80         33.3
# 3     1       100   120         33.3
# 4     2       200   150         66.7
# 5     2       200   230         66.7
# 6     2       200   250         66.7

一种支持这种方法的方法是近似:

### grouped
mean(100 <= d_f$value[1:3])
# [1] 0.3333333
mean(200 <= d_f$value[4:6])
# [1] 0.6666667

### ungrouped
mean(100 <= d_f$value)
# [1] 0.6666667
mean(200 <= d_f$value)
# [1] 0.3333333
© www.soinside.com 2019 - 2024. All rights reserved.