R-pivot_wider 无法折叠行

问题描述 投票:0回答:1

这是我开始使用的数据集。

a <- data.frame(out=c('asd', NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
                      "adhd",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),
                exposure=c('x susceptibility', NA,NA,NA,
                           'hospitalised x', NA,NA,NA,
                           'severe x', NA,NA,NA,
                           'x susceptibility', NA,NA,NA,
                           'hospitalised x', NA,NA,NA,
                           'severe x', NA,NA,NA),
                method=rep(c('a','b','c','d'),6),
                or=c(rnorm(3,0,0.004),NA,rnorm(11,0,0.004),NA,rnorm(8,0,0.004)),
                loci_or=c(rnorm(3,0,0.004),NA,rnorm(11,0,0.004),NA,rnorm(8,0,0.004)),
                upci_or=c(rnorm(3,0,0.004),NA,rnorm(11,0,0.004),NA,rnorm(8,0,0.004)),
                p_val=c(rnorm(3,0,0.004),NA,rnorm(11,0,0.004),NA,rnorm(8,0,0.004)),
                egger_int=c(NA,0.00004,NA,NA,NA,0.00009,NA,NA,NA,0.00003,NA,NA,
                            NA,0.00001,NA,NA,NA,0.00002,NA,NA,NA,0.00007,NA,NA),
                egger_int_p=c(NA,0.00004,NA,NA,NA,0.00009,NA,NA,NA,0.00003,NA,NA,
                              NA,0.00001,NA,NA,NA,0.00002,NA,NA,NA,0.00007,NA,NA))

上述内容的视觉表示类似于 this。现在,有多少方法就有多少行。我想使用 tidyr::pivot_wider (或等效的)来制作它,以便每个结果暴露对有一行。结果列中的 NA 让我能够立即直观地判断使用了哪个结果。

换句话说,让数据看起来像这样:

b <- data.frame(out=c('asd', NA,NA,
                  "adhd",NA,NA),
            exposure=c('x susceptibility','hospitalised x','severe x',
                       'x susceptibility','hospitalised x','severe x'),
            a_or=rnorm(6,0,0.004),
            a_loci_or=rnorm(6,0,0.004),
            a_upci_or=rnorm(6,0,0.004),
            a_p_val=rnorm(6,0,0.004),
            b_or=rnorm(6,0,0.004),
            b_loci_or=rnorm(6,0,0.004),
            b_upci_or=rnorm(6,0,0.004),
            b_p_val=rnorm(6,0,0.004),
            c_or=rnorm(6,0,0.004),
            c_loci_or=rnorm(6,0,0.004),
            c_upci_or=rnorm(6,0,0.004),
            c_p_val=rnorm(6,0,0.004),
            egger_int=c(0.00004,0.00009,0.00003,0.00001,0.00002,0.00007),
            egger_int_p=c(0.00004,0.00009,0.00003,0.00001,0.00002,0.00007))

这是我到目前为止所做的:

tidy_dev <- a %>%
            # fills missing values in these columns using next/previous entry.
            # Values are not repeated,
            tidyr::fill(outcome,exposure) %>%
            # changing from long format to wide format
            tidyr::pivot_wider(names_from = method,
                               values_from = or:p_value,
                               # naming scheme: value1_name1, value2_name1 etc
                               names_vary = 'slowest',
                               # how you want to format column names
                               names_glue = '{method}_{.value}') %>%
            # moving Egger intercept and its p-value to the last column
            dplyr::relocate(c(egger_int,
                              egger_int_p),
                            .after = last_col())

不过,我拥有的是两行相同的结果暴露对。

egger_int
egger_int_p
、以及
egger_or
egger_loci_or
egger_uci_or
列中的值位于一行,其他
{method}_{.value}
列中的其他值位于另一行。所以,当我想要 6 行时,我实际上有 12 行。

我尝试后得到的数据看起来像this,供参考。

r dplyr tidyverse
1个回答
3
投票

问题是您的

egger_
列,每对
outcome
exposure
包含两个类别,即一个值和一个
NA
。因此你最终会得到两行。

解决这个问题的一个选择是使用另一个

fill
来摆脱
NA

library(tidyr)
library(dplyr, warn.conflicts = FALSE)

tidy_dev <- a %>%
  # fills missing values in these columns using next/previous entry.
  # Values are not repeated,
  tidyr::fill(outcome, exposure) %>%
  group_by(outcome, exposure) %>%
  tidyr::fill(starts_with("egger"), .direction = "downup") %>%
  ungroup() %>%
  # changing from long format to wide format
  tidyr::pivot_wider(
    names_from = method,
    values_from = or:p_value,
    # naming scheme: value1_name1, value2_name1 etc
    names_vary = "slowest",
    # how you want to format column names
    names_glue = "{method}_{.value}"
  ) %>%
  # moving Egger intercept and its p-value to the last column
  dplyr::relocate(
    c(
      egger_int,
      egger_int_p
    ),
    .after = last_col()
  )

tidy_dev
#> # A tibble: 6 × 20
#>   outcome exposure         a_or a_loci_or a_upci_or a_p_value     b_or b_loci_or
#>   <chr>   <chr>           <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
#> 1 asd     x susceptib… -5.26e-3  -0.00262  -0.00427   0.00884 -3.18e-3  0.00598 
#> 2 asd     hospitalise…  3.62e-5   0.00144  -0.00840  -0.00632 -4.75e-3  0.00309 
#> 3 asd     severe x      1.95e-3   0.00268  -0.00250  -0.00438  1.04e-3 -0.000874
#> 4 adhd    x susceptib…  1.55e-3  -0.00782  -0.00452   0.00317 -5.73e-4  0.00455 
#> 5 adhd    hospitalise…  1.10e-3   0.00600  -0.00342  -0.00334 -2.32e-3  0.00676 
#> 6 adhd    severe x      2.88e-3   0.00130   0.00352  -0.00317 -8.45e-4 -0.00134 
#> # ℹ 12 more variables: b_upci_or <dbl>, b_p_value <dbl>, c_or <dbl>,
#> #   c_loci_or <dbl>, c_upci_or <dbl>, c_p_value <dbl>, d_or <dbl>,
#> #   d_loci_or <dbl>, d_upci_or <dbl>, d_p_value <dbl>, egger_int <dbl>,
#> #   egger_int_p <dbl>
© www.soinside.com 2019 - 2024. All rights reserved.