如何改变 10 列,如果基因在模块内部,则包含 TRUE,如果不在模块内部,则包含 FALSE?
gene_express = data.frame(gene = c('gene1', 'gene2', 'gene3', 'gene4', 'gene5',
'gene6', 'gene7', 'gene8', 'gene9', 'gene10'), sample1 = sample(0:10,10), sample2 = sample(0:10,10), sample3 = sample(0:10,10), sample4 = sample(0:10,10))
module1 = c('gene1', 'gene2', 'gene10', 'gene8')
module2 = c('gene2', 'gene9', 'gene6', 'gene5', 'gene10')
module3 = c('gene4', 'gene10', 'gene1', 'gene8')
module4 = c('gene5', 'gene8', 'gene2', 'gene7', 'gene6', 'gene5', 'gene10')
module5 = c('gene2', 'gene9', 'gene6', 'gene5', 'gene10')
module6 = c('gene4', 'gene10', 'gene1', 'gene8')
Module_list = list(module1, module2, module3, module4, module5, module6)
names(Module_list) <- c('module1', 'module2', 'module3',
'module4', 'module5', 'module6')
实际上,我有数百个这样的模块,它们已被放入列表的命名列表中,就像我的示例“Module_list”一样。如何改变“gene_express”数据框,使模块名称成为新列,如果基因位于模块内部,则包含 TRUE,如果不在模块内部,则包含 FALSE?
手动方式是在 mutate 函数中指定模块组件,就像我在这里那样
gene_express %>% mutate(
module1 = case_match(gene, c("gene1", "gene2", "gene8", "gene10") ~ TRUE, .default = FALSE),
module2 = case_match(gene, c("gene2", "gene9", "gene6", "gene5", "gene10") ~ TRUE, .default = FALSE),
module3 = case_match(gene, c("gene4", "gene10", "gene1", "gene8") ~ TRUE, .default = FALSE),
module4 = case_match(gene, c("gene2", "gene9", "gene6", "gene5", "gene10") ~ TRUE, .default = FALSE),
module5 = case_match(gene, c("gene4", "gene10", "gene1", "gene8") ~ TRUE, .default = FALSE),
module6 = case_match(gene, c("gene5", "gene2", "gene7", "gene8", "gene6", "gene10") ~ TRUE, .default = FALSE))
我想要的是避免在 mutate 中手动指定模块。
也许是这样的?在这里,我将按模块排列的基因列表放入数据框中,然后我们可以连接到原始数据并用 FALSE 填充未连接的元素。
Module_df <- Module_list |>
map_dfr(as.data.frame, .id = "module") |>
rename(gene = 2)
gene_express |>
left_join(Module_df |> mutate(val = TRUE)) |>
pivot_wider(names_from = module, values_from = val,
values_fn = first, values_fill = FALSE)
结果
# A tibble: 10 × 12
gene sample1 sample2 sample3 sample4 module1 module3 module6 module2 module4 module5 `NA`
<chr> <int> <int> <int> <int> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
1 gene1 10 0 3 4 TRUE TRUE TRUE FALSE FALSE FALSE FALSE
2 gene2 5 8 5 5 TRUE FALSE FALSE TRUE TRUE TRUE FALSE
3 gene3 8 9 7 2 FALSE FALSE FALSE FALSE FALSE FALSE NA
4 gene4 1 5 9 0 FALSE TRUE TRUE FALSE FALSE FALSE FALSE
5 gene5 4 4 8 3 FALSE FALSE FALSE TRUE TRUE TRUE FALSE
6 gene6 6 10 0 9 FALSE FALSE FALSE TRUE TRUE TRUE FALSE
7 gene7 3 1 1 7 FALSE FALSE FALSE FALSE TRUE FALSE FALSE
8 gene8 2 3 6 6 TRUE TRUE TRUE FALSE TRUE FALSE FALSE
9 gene9 0 2 4 1 FALSE FALSE FALSE TRUE FALSE TRUE FALSE
10 gene10 7 6 2 10 TRUE TRUE TRUE TRUE TRUE TRUE FALSE