使用 tidyverse 方法绑定行并从一组管道列表执行左连接

Question

这里是数据的

dput()

解构。

library(tidyverse)

structure(list(L1 = c("Age Class", "Age Class", "Age Class", 
"Age Class", "Gender", "Gender", "Gender", "Age Class", "Age Class", 
"Age Class", "Gender", "Gender", "Age Class", "Age Class", "Age Class", 
"Gender"), L2 = c("Older Youth", "Older Youth", "Younger Youth", 
"Younger Youth", "Female", "Female", "Female", "Younger Youth", 
"Older Youth", "Older Youth", "Male", "Male", "Younger Youth", 
"Older Youth", "Older Youth", "Female"), scr = c(0.78125, 0.90625, 
0.90625, 0.6875, 0.875, 0.78125, 1, 0.65625, 0.75, 0.59375, 0.8125, 
0.75, 0.65625, 0.6875, 0.75, 0.75)), row.names = c(NA, -16L), class = "data.frame")

我想执行中位数和标准误差作为整体统计
通过分组 L1 和 L2 再次执行中位数和标准误差
在 L1 内执行 wilcoxon 测试，因为它包含每组 2 个因子。
合并这三个列表：a) 通过
```
bind_rows()
```
从step1和step2的结果中合并。然后
```
left_join()
```
将p.values (step3)与数据集。

期望的最终结果如下图所示：

我尝试为

list()

中的每个步骤创建一个

dplyr

，但处理

list()

，即在

dplyr

或管道环境中进行选择或过滤很麻烦。但是，以下块有效，但我想尽可能减少列表处理。特别是后半部分代码我觉得可以减少或者简化。

df %>% 
  list(
    a={.} %>% mutate(L1="All", L2="All") %>% summarise(mdn=median(scr), se=(sd(scr)/sqrt(length(scr))), .by = c(L1, L2)),
    b={.} %>% summarise(mdn=median(scr), se=(sd(scr)/sqrt(length(scr))), .by = c(L1, L2)),
    c={.} %>% summarise(pv= wilcox.test(scr~L2)$p.value, .by = L1)) %>% 
  list(
    d= {.} %>% keep(names(.) %in% c('a','b')) %>% bind_rows(), #Reduce codes from this line
    c= {.} %>% pluck("c")) %>% 
  keep(names(.) %in% c('c','d')) %>%
  reduce(left_join, by="L1") #to this line

想知道是否有嵌套数据框的范围。任何减少脚本/文本的

purrr::map()

方法。

Answer 1

根据阿德里亚诺的观点，在我看来，（到目前为止）最简单的方法就是分别执行这三个非常不同的操作，然后将输出绑定在一起：

# Overall statistics
out_1 <- df %>%
  summarize(
    mdn = median(scr),
    se = sd(scr) / sqrt(n())
  ) %>%
  mutate(
    L1 = "All", 
    L2 = "All"
  )

# Statistics by group
out_2 <- df %>%
  group_by(L1, L2) %>%
  summarize(
    mdn = median(scr),
    se = sd(scr) / sqrt(n())
  )

# Wilcoxon test
out_3 <- df %>%
  group_by(L1) %>%
  summarize(
    pv = wilcox.test(scr ~ L2)$p.value
  )

# Combine
out <- out_1 %>%
  bind_rows(out_2) %>%
  left_join(out_3)

       mdn         se        L1            L2        pv
1 0.750000 0.02702097       All           All        NA
2 0.750000 0.04224854 Age Class   Older Youth 0.5894851
3 0.671875 0.06034703 Age Class Younger Youth 0.5894851
4 0.828125 0.05615588    Gender        Female 0.6385921
5 0.781250 0.03125000    Gender          Male 0.6385921

如果您要重复执行此操作，您可以从中创建一个函数。

或者，如果您想在一个管道中完成所有操作，并且您可以接受输出格式略有不同的格式：

df %>%
  mutate(
    mdn_overall = median(scr),
    se_overall = sd(scr) / sqrt(n())
  ) %>%
  group_by(L1) %>%
  mutate(
    pv = wilcox.test(scr ~ L2)$p.value
  ) %>%
  group_by(L1, L2, mdn_overall, se_overall, pv) %>%
  summarize(
    mdn_group = median(scr),
    se_group = sd(scr) / sqrt(n())
  )

  L1        L2            mdn_overall se_overall    pv mdn_group se_group
  <chr>     <chr>               <dbl>      <dbl> <dbl>     <dbl>    <dbl>
1 Age Class Older Youth          0.75     0.0270 0.589     0.75    0.0422
2 Age Class Younger Youth        0.75     0.0270 0.589     0.672   0.0603
3 Gender    Female               0.75     0.0270 0.639     0.828   0.0562
4 Gender    Male                 0.75     0.0270 0.639     0.781   0.0312

使用 tidyverse 方法绑定行并从一组管道列表执行左连接

问题描述投票：0回答：1

1个回答

最新问题

使用 tidyverse 方法绑定行并从一组管道列表执行左连接

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1