按组选择前 n 个值，其中 n 取决于数据框中的其他值

Question

我对 R 和编码总体来说还很陌生。我们将非常感谢您的帮助:)

我正在尝试按组选择前 n 个值，其中 n 取决于我的数据框中的其他值（在下文中称为

factor

）。然后，应将所选值按组汇总以计算平均值（

d100

）。我的目标是为每组获得一个

d100

值。

（背景：林业中有一个指标叫d100，是每公顷100棵最粗的树木的平均直径。如果采样面积小于1公顷，则需要相应地选择较少的树木来计算d100。这就是 factor 的用途。）

首先，我尝试将

factor

作为自己的列放入数据框中。然后我想也许有一个像“查找表”这样的东西会有所帮助，因为 R 说，n 必须是一个数字。但我不知道如何创建查找功能。（请参阅示例代码的最后一部分。）或者也许在使用它之前总结

df$factor

可以解决问题？

样本数据：

（我指出了一些我不确定如何在 R 中编码的表达式，如下所示：“我不知道如何”）

# creating sample data
library(tidyverse)

df <- data.frame(group = c(rep(1, each = 5), rep(2, each = 8), rep(3, each = 10)),
                 BHD = c(rnorm(23, mean = 30, sd = 5)),
                 factor = c(rep(pi*(15/100)^2, each = 5), rep(pi*(20/100)^2, each = 8), rep(pi*(25/100)^2, each = 10))
                )

# group by ID, then select top_n values of df$BHD with n depending on value of df$factor
df %>% 
  group_by(group) %>% 
  slice_max(
    BHD, 
    n = 100*df$factor, 
    with_ties = F) %>% 
  summarise(d100 = mean('sliced values per group'))

# other thought: having a "lookup-table" for the factor like this:
lt <- data.frame(group = c(1, 2, 3),
                 factor = c(pi*(15/100)^2, pi*(20/100)^2, pi*(25/100)^2))

# then
df %>% 
  group_by(group) %>% 
  slice_max(
    BHD, 
    n = 100*lt$factor 'where lt$group == df$group', 
    with_ties = F) %>% 
  summarise(d100 = mean('sliced values per group'))

我已经找到了this解决了一个与我的问题相似的问题，但它并没有太大帮助。

Answer 1

由于每组中所有

factor

值都相同，因此您可以选择任意一个

factor

值。

library(dplyr)

df %>% 
  group_by(group) %>% 
  top_n(BHD, n = 100* first(factor))  %>%
  ungroup 

#   group   BHD factor
#   <dbl> <dbl>  <dbl>
# 1     1  25.8 0.0707
# 2     1  24.6 0.0707
# 3     1  27.6 0.0707
# 4     1  28.3 0.0707
# 5     1  29.2 0.0707
# 6     2  28.8 0.126 
# 7     2  39.5 0.126 
# 8     2  23.1 0.126 
# 9     2  27.9 0.126 
#10     2  31.7 0.126 
# … with 13 more rows

Answer 2

替代解决方案

这将为您提供每个

factor

 的最高

group

的最高值

    df%>%
      group_by(group)%>%
      slice_max(factor)
    # A tibble: 23 x 3
# Groups:   group [3]
   group   BHD factor
   <dbl> <dbl>  <dbl>
 1     1  29.9 0.0707
 2     1  31.8 0.0707
 3     1  25.7 0.0707
 4     1  30.7 0.0707
 5     1  23.6 0.0707
 6     2  23.8 0.126 
 7     2  28.3 0.126 
 8     2  30.7 0.126 
 9     2  26.2 0.126 
10     2  31.4 0.126 
# i 13 more rows
# i Use `print(n = ...)` to see more rows

您可以将其总结为

d100

 df%>%
  group_by(group)%>%
  slice_max(factor)%>%
  summarise(d100= mean(100*factor)

    # A tibble: 3 x 2
  group  d100
  <dbl> <dbl>
1     1  7.07
2     2 12.6 
3     3 19.6

按组选择前 n 个值，其中 n 取决于数据框中的其他值

问题描述投票：0回答：2

2个回答

最新问题

按组选择前 n 个值，其中 n 取决于数据框中的其他值

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2