如何将包含因子的表转换为包含计数的表（在 R 中）？

Question

我正在使用 GSEA 分析（来自

clusterProfiler

包）并希望执行前沿分析。为此，我需要从

gseaResult

中提取原始数据。

#FYI my code looks like this:
GSEA_GO <- gseGO(geneList=gene_list, keyType = "SYMBOL", OrgDb = org.Hs.eg.db)
View(data.frame(GSEA_GO@result))

#after extraction and data transformation, this is a reprex of what I end with:
#one letter being a gene name (included in the leading edge), and "GSx" being a gene set
GS1 <- c("a", "b", "c", "d", "e", "f") 
GS2 <- c("b", "c", "d", "e", "f", "g") 
GS3 <- c("a", "b", "c", NA,NA,NA) 
GS4 <- c("a", "d", "e", "g", NA, NA) 
GS5 <- c("a", "b", "c", "d", NA, NA) 
df <- data.frame(rbind(GS1, GS2, GS3, GS4, GS5))

为了更进一步，我必须将此表转换为另一个表，其中每一列代表基因集中（即行）中基因的存在（= 1）或不存在（= 0）。它看起来像这样：

当然我有数百个基因，数百个基因组...... 我不想用 ifelse 手动完成所有事情...... 谁能提供一些走向正确方向的线索？谢谢！

Answer 1

可能有更优雅的方法来做到这一点，但我会再次尝试熔化和铸造：

# create id column
df$id <- rownames(df)

# melted
df_melt <- df |>
    data.table::as.data.table() |>
    data.table::melt(id.vars = "id") |>
    na.omit()

> head(df_melt)
    id variable value
1: GS1       X1     a
2: GS2       X1     b
3: GS3       X1     a
4: GS4       X1     a
5: GS5       X1     a
6: GS1       X2     b

然后你可以再次抛投：

# wide
df_wide <- data.table::dcast(df_melt, id ~ value)

> df_wide
    id    a    b    c    d    e    f    g
1: GS1    a    b    c    d    e    f <NA>
2: GS2 <NA>    b    c    d    e    f    g
3: GS3    a    b    c <NA> <NA> <NA> <NA>
4: GS4    a <NA> <NA>    d    e <NA>    g
5: GS5    a    b    c    d <NA> <NA> <NA>

然后您可以将所有列（不包括 id）突变为 1（如果存在）、0（如果不存在）。

# get letter column only
genes <- colnames(df)[colnames(df) != "id"]

# change all gene cols to be 1 if present, 0 if absent
df_wide[, (genes) := lapply(.SD, function(x) ifelse(is.na(x), 0, 1)), .SDcols = genes]

> df_wide
    id a b c d e f g
1: GS1 1 1 1 1 1 1 0
2: GS2 0 1 1 1 1 1 1
3: GS3 1 1 1 0 0 0 0
4: GS4 1 0 0 1 1 0 1
5: GS5 1 1 1 1 0 0 0

如何将包含因子的表转换为包含计数的表（在 R 中）？

问题描述投票：0回答：1

1个回答

最新问题

如何将包含因子的表转换为包含计数的表（在 R 中）？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1