使用 tidyverse 方法绑定行并从一组管道列表执行左连接

问题描述 投票:0回答:1

这里是数据的

dput()
解构。

library(tidyverse)

structure(list(L1 = c("Age Class", "Age Class", "Age Class", 
"Age Class", "Gender", "Gender", "Gender", "Age Class", "Age Class", 
"Age Class", "Gender", "Gender", "Age Class", "Age Class", "Age Class", 
"Gender"), L2 = c("Older Youth", "Older Youth", "Younger Youth", 
"Younger Youth", "Female", "Female", "Female", "Younger Youth", 
"Older Youth", "Older Youth", "Male", "Male", "Younger Youth", 
"Older Youth", "Older Youth", "Female"), scr = c(0.78125, 0.90625, 
0.90625, 0.6875, 0.875, 0.78125, 1, 0.65625, 0.75, 0.59375, 0.8125, 
0.75, 0.65625, 0.6875, 0.75, 0.75)), row.names = c(NA, -16L), class = "data.frame")

  1. 我想执行中位数标准误差作为整体统计

  2. 通过分组 L1 和 L2 再次执行中位数标准误差

  3. 在 L1 内执行 wilcoxon 测试,因为它包含每组 2 个因子。

  4. 合并这三个列表:a) 通过

    bind_rows()
    step1step2的结果中合并。然后
    left_join()
    p.values (step3)与数据集。

期望的最终结果如下图所示:

我尝试为

list()
中的每个步骤创建一个
dplyr
,但处理
list()
,即在
dplyr
或管道环境中进行选择或过滤很麻烦。但是,以下块有效,但我想尽可能减少列表处理。特别是后半部分代码我觉得可以减少或者简化。

df %>% 
  list(
    a={.} %>% mutate(L1="All", L2="All") %>% summarise(mdn=median(scr), se=(sd(scr)/sqrt(length(scr))), .by = c(L1, L2)),
    b={.} %>% summarise(mdn=median(scr), se=(sd(scr)/sqrt(length(scr))), .by = c(L1, L2)),
    c={.} %>% summarise(pv= wilcox.test(scr~L2)$p.value, .by = L1)) %>% 
  list(
    d= {.} %>% keep(names(.) %in% c('a','b')) %>% bind_rows(), #Reduce codes from this line
    c= {.} %>% pluck("c")) %>% 
  keep(names(.) %in% c('c','d')) %>%
  reduce(left_join, by="L1") #to this line

想知道是否有嵌套数据框的范围。任何减少脚本/文本的

purrr::map()
方法。

r dplyr tidyverse
1个回答
0
投票

根据阿德里亚诺的观点,在我看来,(到目前为止)最简单的方法就是分别执行这三个非常不同的操作,然后将输出绑定在一起:

# Overall statistics
out_1 <- df %>%
  summarize(
    mdn = median(scr),
    se = sd(scr) / sqrt(n())
  ) %>%
  mutate(
    L1 = "All", 
    L2 = "All"
  )

# Statistics by group
out_2 <- df %>%
  group_by(L1, L2) %>%
  summarize(
    mdn = median(scr),
    se = sd(scr) / sqrt(n())
  )

# Wilcoxon test
out_3 <- df %>%
  group_by(L1) %>%
  summarize(
    pv = wilcox.test(scr ~ L2)$p.value
  )

# Combine
out <- out_1 %>%
  bind_rows(out_2) %>%
  left_join(out_3)

       mdn         se        L1            L2        pv
1 0.750000 0.02702097       All           All        NA
2 0.750000 0.04224854 Age Class   Older Youth 0.5894851
3 0.671875 0.06034703 Age Class Younger Youth 0.5894851
4 0.828125 0.05615588    Gender        Female 0.6385921
5 0.781250 0.03125000    Gender          Male 0.6385921

如果您要重复执行此操作,您可以从中创建一个函数。

或者,如果您想在一个管道中完成所有操作,并且您可以接受输出格式略有不同的格式:

df %>%
  mutate(
    mdn_overall = median(scr),
    se_overall = sd(scr) / sqrt(n())
  ) %>%
  group_by(L1) %>%
  mutate(
    pv = wilcox.test(scr ~ L2)$p.value
  ) %>%
  group_by(L1, L2, mdn_overall, se_overall, pv) %>%
  summarize(
    mdn_group = median(scr),
    se_group = sd(scr) / sqrt(n())
  )

  L1        L2            mdn_overall se_overall    pv mdn_group se_group
  <chr>     <chr>               <dbl>      <dbl> <dbl>     <dbl>    <dbl>
1 Age Class Older Youth          0.75     0.0270 0.589     0.75    0.0422
2 Age Class Younger Youth        0.75     0.0270 0.589     0.672   0.0603
3 Gender    Female               0.75     0.0270 0.639     0.828   0.0562
4 Gender    Male                 0.75     0.0270 0.639     0.781   0.0312
© www.soinside.com 2019 - 2024. All rights reserved.