这里是数据的
dput()
解构。
library(tidyverse)
structure(list(L1 = c("Age Class", "Age Class", "Age Class",
"Age Class", "Gender", "Gender", "Gender", "Age Class", "Age Class",
"Age Class", "Gender", "Gender", "Age Class", "Age Class", "Age Class",
"Gender"), L2 = c("Older Youth", "Older Youth", "Younger Youth",
"Younger Youth", "Female", "Female", "Female", "Younger Youth",
"Older Youth", "Older Youth", "Male", "Male", "Younger Youth",
"Older Youth", "Older Youth", "Female"), scr = c(0.78125, 0.90625,
0.90625, 0.6875, 0.875, 0.78125, 1, 0.65625, 0.75, 0.59375, 0.8125,
0.75, 0.65625, 0.6875, 0.75, 0.75)), row.names = c(NA, -16L), class = "data.frame")
我想执行中位数和标准误差作为整体统计
通过分组 L1 和 L2 再次执行中位数和标准误差
在 L1 内执行 wilcoxon 测试,因为它包含每组 2 个因子。
合并这三个列表:a) 通过
bind_rows()
从step1和step2的结果中合并。然后left_join()
将p.values (step3)与数据集。
期望的最终结果如下图所示:
我尝试为
list()
中的每个步骤创建一个dplyr
,但处理list()
,即在dplyr
或管道环境中进行选择或过滤很麻烦。但是,以下块有效,但我想尽可能减少列表处理。特别是后半部分代码我觉得可以减少或者简化。
df %>%
list(
a={.} %>% mutate(L1="All", L2="All") %>% summarise(mdn=median(scr), se=(sd(scr)/sqrt(length(scr))), .by = c(L1, L2)),
b={.} %>% summarise(mdn=median(scr), se=(sd(scr)/sqrt(length(scr))), .by = c(L1, L2)),
c={.} %>% summarise(pv= wilcox.test(scr~L2)$p.value, .by = L1)) %>%
list(
d= {.} %>% keep(names(.) %in% c('a','b')) %>% bind_rows(), #Reduce codes from this line
c= {.} %>% pluck("c")) %>%
keep(names(.) %in% c('c','d')) %>%
reduce(left_join, by="L1") #to this line
想知道是否有嵌套数据框的范围。任何减少脚本/文本的
purrr::map()
方法。
根据阿德里亚诺的观点,在我看来,(到目前为止)最简单的方法就是分别执行这三个非常不同的操作,然后将输出绑定在一起:
# Overall statistics
out_1 <- df %>%
summarize(
mdn = median(scr),
se = sd(scr) / sqrt(n())
) %>%
mutate(
L1 = "All",
L2 = "All"
)
# Statistics by group
out_2 <- df %>%
group_by(L1, L2) %>%
summarize(
mdn = median(scr),
se = sd(scr) / sqrt(n())
)
# Wilcoxon test
out_3 <- df %>%
group_by(L1) %>%
summarize(
pv = wilcox.test(scr ~ L2)$p.value
)
# Combine
out <- out_1 %>%
bind_rows(out_2) %>%
left_join(out_3)
mdn se L1 L2 pv
1 0.750000 0.02702097 All All NA
2 0.750000 0.04224854 Age Class Older Youth 0.5894851
3 0.671875 0.06034703 Age Class Younger Youth 0.5894851
4 0.828125 0.05615588 Gender Female 0.6385921
5 0.781250 0.03125000 Gender Male 0.6385921
如果您要重复执行此操作,您可以从中创建一个函数。
或者,如果您想在一个管道中完成所有操作,并且您可以接受输出格式略有不同的格式:
df %>%
mutate(
mdn_overall = median(scr),
se_overall = sd(scr) / sqrt(n())
) %>%
group_by(L1) %>%
mutate(
pv = wilcox.test(scr ~ L2)$p.value
) %>%
group_by(L1, L2, mdn_overall, se_overall, pv) %>%
summarize(
mdn_group = median(scr),
se_group = sd(scr) / sqrt(n())
)
L1 L2 mdn_overall se_overall pv mdn_group se_group
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Age Class Older Youth 0.75 0.0270 0.589 0.75 0.0422
2 Age Class Younger Youth 0.75 0.0270 0.589 0.672 0.0603
3 Gender Female 0.75 0.0270 0.639 0.828 0.0562
4 Gender Male 0.75 0.0270 0.639 0.781 0.0312