如何将包含因子的表转换为包含计数的表(在 R 中)?

问题描述 投票:0回答:1

我正在使用 GSEA 分析(来自

clusterProfiler
包)并希望执行前沿分析。为此,我需要从
gseaResult
中提取原始数据。

#FYI my code looks like this:
GSEA_GO <- gseGO(geneList=gene_list, keyType = "SYMBOL", OrgDb = org.Hs.eg.db)
View(data.frame(GSEA_GO@result))
#after extraction and data transformation, this is a reprex of what I end with:
#one letter being a gene name (included in the leading edge), and "GSx" being a gene set
GS1 <- c("a", "b", "c", "d", "e", "f") 
GS2 <- c("b", "c", "d", "e", "f", "g") 
GS3 <- c("a", "b", "c", NA,NA,NA) 
GS4 <- c("a", "d", "e", "g", NA, NA) 
GS5 <- c("a", "b", "c", "d", NA, NA) 
df <- data.frame(rbind(GS1, GS2, GS3, GS4, GS5))

为了更进一步,我必须将此表转换为另一个表,其中每一列代表基因集中(即行)中基因的存在(= 1)或不存在(= 0)。它看起来像这样:

当然我有数百个基因,数百个基因组...... 我不想用 ifelse 手动完成所有事情...... 谁能提供一些走向正确方向的线索? 谢谢!

r dataframe transformation
1个回答
0
投票

可能有更优雅的方法来做到这一点,但我会再次尝试熔化和铸造:

# create id column
df$id <- rownames(df)

# melted
df_melt <- df |>
    data.table::as.data.table() |>
    data.table::melt(id.vars = "id") |>
    na.omit()
> head(df_melt)
    id variable value
1: GS1       X1     a
2: GS2       X1     b
3: GS3       X1     a
4: GS4       X1     a
5: GS5       X1     a
6: GS1       X2     b

然后你可以再次抛投:

# wide
df_wide <- data.table::dcast(df_melt, id ~ value)
> df_wide
    id    a    b    c    d    e    f    g
1: GS1    a    b    c    d    e    f <NA>
2: GS2 <NA>    b    c    d    e    f    g
3: GS3    a    b    c <NA> <NA> <NA> <NA>
4: GS4    a <NA> <NA>    d    e <NA>    g
5: GS5    a    b    c    d <NA> <NA> <NA>

然后您可以将所有列(不包括 id)突变为 1(如果存在)、0(如果不存在)。

# get letter column only
genes <- colnames(df)[colnames(df) != "id"]

# change all gene cols to be 1 if present, 0 if absent
df_wide[, (genes) := lapply(.SD, function(x) ifelse(is.na(x), 0, 1)), .SDcols = genes]
> df_wide
    id a b c d e f g
1: GS1 1 1 1 1 1 1 0
2: GS2 0 1 1 1 1 1 1
3: GS3 1 1 1 0 0 0 0
4: GS4 1 0 0 1 1 0 1
5: GS5 1 1 1 1 0 0 0
© www.soinside.com 2019 - 2024. All rights reserved.