这是我开始使用的数据集。
a <- data.frame(out=c('asd', NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
"adhd",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),
exposure=c('x susceptibility', NA,NA,NA,
'hospitalised x', NA,NA,NA,
'severe x', NA,NA,NA,
'x susceptibility', NA,NA,NA,
'hospitalised x', NA,NA,NA,
'severe x', NA,NA,NA),
method=rep(c('a','b','c','d'),6),
or=c(rnorm(3,0,0.004),NA,rnorm(11,0,0.004),NA,rnorm(8,0,0.004)),
loci_or=c(rnorm(3,0,0.004),NA,rnorm(11,0,0.004),NA,rnorm(8,0,0.004)),
upci_or=c(rnorm(3,0,0.004),NA,rnorm(11,0,0.004),NA,rnorm(8,0,0.004)),
p_val=c(rnorm(3,0,0.004),NA,rnorm(11,0,0.004),NA,rnorm(8,0,0.004)),
egger_int=c(NA,0.00004,NA,NA,NA,0.00009,NA,NA,NA,0.00003,NA,NA,
NA,0.00001,NA,NA,NA,0.00002,NA,NA,NA,0.00007,NA,NA),
egger_int_p=c(NA,0.00004,NA,NA,NA,0.00009,NA,NA,NA,0.00003,NA,NA,
NA,0.00001,NA,NA,NA,0.00002,NA,NA,NA,0.00007,NA,NA))
上述内容的视觉表示类似于 。现在,有多少方法就有多少行。我想使用 tidyr::pivot_wider (或等效的)来制作它,以便每个结果暴露对有一行。结果列中的 NA 让我能够立即直观地判断使用了哪个结果。
换句话说,让数据看起来像这样:
b <- data.frame(out=c('asd', NA,NA,
"adhd",NA,NA),
exposure=c('x susceptibility','hospitalised x','severe x',
'x susceptibility','hospitalised x','severe x'),
a_or=rnorm(6,0,0.004),
a_loci_or=rnorm(6,0,0.004),
a_upci_or=rnorm(6,0,0.004),
a_p_val=rnorm(6,0,0.004),
b_or=rnorm(6,0,0.004),
b_loci_or=rnorm(6,0,0.004),
b_upci_or=rnorm(6,0,0.004),
b_p_val=rnorm(6,0,0.004),
c_or=rnorm(6,0,0.004),
c_loci_or=rnorm(6,0,0.004),
c_upci_or=rnorm(6,0,0.004),
c_p_val=rnorm(6,0,0.004),
egger_int=c(0.00004,0.00009,0.00003,0.00001,0.00002,0.00007),
egger_int_p=c(0.00004,0.00009,0.00003,0.00001,0.00002,0.00007))
这是我到目前为止所做的:
tidy_dev <- a %>%
# fills missing values in these columns using next/previous entry.
# Values are not repeated,
tidyr::fill(outcome,exposure) %>%
# changing from long format to wide format
tidyr::pivot_wider(names_from = method,
values_from = or:p_value,
# naming scheme: value1_name1, value2_name1 etc
names_vary = 'slowest',
# how you want to format column names
names_glue = '{method}_{.value}') %>%
# moving Egger intercept and its p-value to the last column
dplyr::relocate(c(egger_int,
egger_int_p),
.after = last_col())
不过,我拥有的是两行相同的结果暴露对。
egger_int
、egger_int_p
、以及 egger_or
、egger_loci_or
和 egger_uci_or
列中的值位于一行,其他 {method}_{.value}
列中的其他值位于另一行。所以,当我想要 6 行时,我实际上有 12 行。
问题是您的
egger_
列,每对 outcome
和 exposure
包含两个类别,即一个值和一个 NA
。因此你最终会得到两行。
解决这个问题的一个选择是使用另一个
fill
来摆脱NA
:
library(tidyr)
library(dplyr, warn.conflicts = FALSE)
tidy_dev <- a %>%
# fills missing values in these columns using next/previous entry.
# Values are not repeated,
tidyr::fill(outcome, exposure) %>%
group_by(outcome, exposure) %>%
tidyr::fill(starts_with("egger"), .direction = "downup") %>%
ungroup() %>%
# changing from long format to wide format
tidyr::pivot_wider(
names_from = method,
values_from = or:p_value,
# naming scheme: value1_name1, value2_name1 etc
names_vary = "slowest",
# how you want to format column names
names_glue = "{method}_{.value}"
) %>%
# moving Egger intercept and its p-value to the last column
dplyr::relocate(
c(
egger_int,
egger_int_p
),
.after = last_col()
)
tidy_dev
#> # A tibble: 6 × 20
#> outcome exposure a_or a_loci_or a_upci_or a_p_value b_or b_loci_or
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 asd x susceptib… -5.26e-3 -0.00262 -0.00427 0.00884 -3.18e-3 0.00598
#> 2 asd hospitalise… 3.62e-5 0.00144 -0.00840 -0.00632 -4.75e-3 0.00309
#> 3 asd severe x 1.95e-3 0.00268 -0.00250 -0.00438 1.04e-3 -0.000874
#> 4 adhd x susceptib… 1.55e-3 -0.00782 -0.00452 0.00317 -5.73e-4 0.00455
#> 5 adhd hospitalise… 1.10e-3 0.00600 -0.00342 -0.00334 -2.32e-3 0.00676
#> 6 adhd severe x 2.88e-3 0.00130 0.00352 -0.00317 -8.45e-4 -0.00134
#> # ℹ 12 more variables: b_upci_or <dbl>, b_p_value <dbl>, c_or <dbl>,
#> # c_loci_or <dbl>, c_upci_or <dbl>, c_p_value <dbl>, d_or <dbl>,
#> # d_loci_or <dbl>, d_upci_or <dbl>, d_p_value <dbl>, egger_int <dbl>,
#> # egger_int_p <dbl>