如何删除R中除某些特定名称之外的所有字符串?

问题描述 投票:0回答:1

经过一段时间的研究,并尝试使用 sub 或 gsub 后,我没有找到我想要的。

输入:

structure(list(submitter_id = c("TCGA-B6-A0RH-01A-21R-A115-07", 
"TCGA-BH-A1FU-11A-23R-A14D-07", "TCGA-BH-A1FU-01A-11R-A14D-07", 
"TCGA-AR-A0TX-01A-11R-A084-07", "TCGA-A1-A0SE-01A-11R-A084-07", 
"TCGA-BH-A1FC-11A-32R-A13Q-07", "TCGA-OL-A5D6-01A-21R-A27Q-07", 
"TCGA-E2-A1IK-01A-11R-A144-07", "TCGA-AC-A2FM-11B-32R-A19W-07", 
"TCGA-AN-A0FT-01A-11R-A034-07"), sample_type = c("Primary Tumor", 
"Solid Tissue Normal", "Primary Tumor", "Primary Tumor", "Metastatic", 
"Solid Tissue Normal", "Primary Tumor", "Primary Tumor", "Solid Tissue Normal", 
"Primary Tumor")), row.names = c(NA, 10L), class = "data.frame")

我想做的是仅保留字符串中的“Tumor”和“Normal”(如果“sample_type”列中存在)并删除所有内容。此外,我只想选择包含“肿瘤”和“正常”的行。

预期输出:

structure(list(submitter_id = c("TCGA-B6-A0RH-01A-21R-A115-07", 
"TCGA-BH-A1FU-11A-23R-A14D-07", "TCGA-BH-A1FU-01A-11R-A14D-07", 
"TCGA-AR-A0TX-01A-11R-A084-07", "TCGA-BH-A1FC-11A-32R-A13Q-07", 
"TCGA-OL-A5D6-01A-21R-A27Q-07", "TCGA-E2-A1IK-01A-11R-A144-07", 
"TCGA-AC-A2FM-11B-32R-A19W-07", "TCGA-AN-A0FT-01A-11R-A034-07"
), sample_type = c("Tumor", "Normal", "Tumor", "Tumor", "Normal", 
"Tumor", "Tumor", "Normal", "Tumor")), row.names = c(NA, 9L), class = "data.frame")

谢谢你

我尝试了 gsub 或 sub 和 substr 但由于字符长度变化而无法工作。

r regex
1个回答
0
投票
library(tidyverse)

df <- structure(list(submitter_id = c(
  "TCGA-B6-A0RH-01A-21R-A115-07",
  "TCGA-BH-A1FU-11A-23R-A14D-07", "TCGA-BH-A1FU-01A-11R-A14D-07",
  "TCGA-AR-A0TX-01A-11R-A084-07", "TCGA-A1-A0SE-01A-11R-A084-07",
  "TCGA-BH-A1FC-11A-32R-A13Q-07", "TCGA-OL-A5D6-01A-21R-A27Q-07",
  "TCGA-E2-A1IK-01A-11R-A144-07", "TCGA-AC-A2FM-11B-32R-A19W-07",
  "TCGA-AN-A0FT-01A-11R-A034-07"
), sample_type = c(
  "Primary Tumor",
  "Solid Tissue Normal", "Primary Tumor", "Primary Tumor", "Metastatic",
  "Solid Tissue Normal", "Primary Tumor", "Primary Tumor", "Solid Tissue Normal",
  "Primary Tumor"
)), row.names = c(NA, 10L), class = "data.frame")

df |>
  mutate(sample_type = str_extract(sample_type, c("Tumor|Normal"))) |>
  drop_na(sample_type)
#>                   submitter_id sample_type
#> 1 TCGA-B6-A0RH-01A-21R-A115-07       Tumor
#> 2 TCGA-BH-A1FU-11A-23R-A14D-07      Normal
#> 3 TCGA-BH-A1FU-01A-11R-A14D-07       Tumor
#> 4 TCGA-AR-A0TX-01A-11R-A084-07       Tumor
#> 5 TCGA-BH-A1FC-11A-32R-A13Q-07      Normal
#> 6 TCGA-OL-A5D6-01A-21R-A27Q-07       Tumor
#> 7 TCGA-E2-A1IK-01A-11R-A144-07       Tumor
#> 8 TCGA-AC-A2FM-11B-32R-A19W-07      Normal
#> 9 TCGA-AN-A0FT-01A-11R-A034-07       Tumor

创建于 2024-04-13,使用 reprex v2.1.0

© www.soinside.com 2019 - 2024. All rights reserved.