我对此很陌生,大多数指南要么没有返回所需的结果,要么就在我头上。 group_by 后跟 summarize 允许我计算这些行的平均值/中值,但返回表的行数没有减少。
附上我的数据样本。
structure(list(S = c("Balaclava", "Balaclava", "Carnegie", "Carnegie"), Rn = c(3, 2, 2, 2), T = c("h", "u", "t", "u" ), P = c(1690000, 540000, 795000, 6e+05), M = c("S", "VB", "S", "SP"), D = c(6.6, 6.6, 11.4, 11.4), BR = c(3, 2, 2, 2), BT = c(2, 1, 2, 1), C = c(2, 1, 1, 1), L = c(339, 483, 133, 73), BA = c(159, 51, 104, 61), YB = c(1890, 1970, 2009, 1970)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))
data2 <- data1 %>%
group_by(S) %>%
summarize(MRn = median(Rn),
APA = mean(P),
AAA = mean(BA),
AAL = mean(L), YB) %>%
arrange(desc(MRn))
data2
我想将具有相同列 S 的行分组,并为 S 右侧的列生成平均值/中值,每个 S 条目一行。由此产生的“组”需要用于绘图。
summarize
should 减少行数的原因是因为它通常与为整个组返回 1 个值的函数(如 mean
或 median
)一起使用,导致该组有 1 行。你在这里这样做,但最后你告诉它也返回YB
(我假设这就是你的意思,因为数据中没有Y
)没有任何转换。
如果您查看输出,您会发现您确实为每个组生成了 1 行,但随后它被复制以允许保留 YB 的两个值:
S MRn APA AAA AAL YB
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Balaclava 2.5 1115000 105 411 1890
2 Balaclava 2.5 1115000 105 411 1970
3 Carnegie 2 697500 82.5 103 2009
4 Carnegie 2 697500 82.5 103 1970
要获得您想要的结果,请删除
YB
,或对其应用返回单个值的函数(如min
或paste0
):
data1 %>%
group_by(S) %>%
summarize(MRn = median(Rn),
APA = mean(P),
AAA = mean(BA),
AAL = mean(L)) %>%
arrange(desc(MRn))
# A tibble: 2 × 5
S MRn APA AAA AAL
<chr> <dbl> <dbl> <dbl> <dbl>
1 Balaclava 2.5 1115000 105 411
2 Carnegie 2 697500 82.5 103
data1 %>%
group_by(S) %>%
summarize(MRn = median(Rn),
APA = mean(P),
AAA = mean(BA),
AAL = mean(L),
YB = paste0(YB, collapse=',')) %>%
arrange(desc(MRn))
# A tibble: 2 × 6
S MRn APA AAA AAL YB
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 Balaclava 2.5 1115000 105 411 1890,1970
2 Carnegie 2 697500 82.5 103 2009,1970