遍历特定的列数据，并将结果添加为R中的新列

Question

我有一个带有以下信息的数据框df：

df <- structure(list(Samples = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 10L, 2L, 1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 2L, 1L, 
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 2L, 1L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 10L, 2L), .Label = c("Sample1", "Sample10", "Sample2", 
"Sample3", "Sample4", "Sample5", "Sample6", "Sample7", "Sample8", 
"Sample9"), class = "factor"), patient.vital_status = c(0L, 0L, 
0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 1L, 0L, 1L), years = c(3.909589041, 1.457534247, 
2.336986301, 5.010958904, 1.665753425, 1.81369863, 1.191780822, 
4.687671233, 2.167123288, 1.95890411, 3.909589041, 1.457534247, 
2.336986301, 5.010958904, 1.665753425, 1.81369863, 1.191780822, 
4.687671233, 2.167123288, 1.95890411, 3.909589041, 1.457534247, 
2.336986301, 5.010958904, 1.665753425, 1.81369863, 1.191780822, 
4.687671233, 2.167123288, 1.95890411, 3.909589041, 1.457534247, 
2.336986301, 5.010958904, 1.665753425, 1.81369863, 1.191780822, 
4.687671233, 2.167123288, 1.95890411), Genes = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("A1BG", "A1CF", "A2M", 
"A2ML1"), class = "factor"), value = c(0.034459012, 0.017698878, 
0.023313851, 0.010456762, 0.032674019, 0.037561831, 0.03380681, 
0, 0.019954956, 0.012392427, 0.835801613, 2.265192447, 2.431409095, 
5.012117956, 2.139962802, 2.371946704, 4.555234385, 0.550293401, 
0.924012327, 2.274642129, 92.85639578, 79.50897642, 23.72187602, 
26.86025304, 32.80504253, 222.6449054, 71.78812505, 45.76371588, 
29.93976676, 22.97515484, 0.03780441, 0.005825143, 0, 0.002867985, 
0.011948708, 0.02060423, 0.004636111, 0.015903347, 0.005473063, 
0.033988816)), class = "data.frame", row.names = c(NA, -40L))

我想基于列Genes和value遍历信息并获得结果。再次，我希望将结果添加到数据帧df。结果将是low或high。

我正在尝试使用以下代码来完成此操作，但是它不起作用：

genes <- as.character(unique(df$Genes))

library(survival)
library(survminer)

for(i in genes){
  surv_rnaseq.cut <- surv_cutpoint(
    df,
    time = "years",
    event = "patient.vital_status",
    variables = c("Genes","value"))

  df$cat <- surv_categorize(surv_rnaseq.cut)
}

除了上述结果外，我还希望总结所有四个基因的surv_rnaseq.cut，并提及其名称。

请任何帮助。 thanq

Answer 1

一种选择是用'genes'（group_split）分割，在list上循环，应用函数并在创建列后绑定list元素

library(survminer)
library(survival)
library(dplyr)
library(purrr)
df %>% 
  group_split(Genes) %>%
  map_dfr(~ surv_cutpoint(.x, 
                         time = "years",
                         event = "patient.vital_status",
                         variables = c("Genes", "value")) %>% 
                surv_categorize %>% 
                pull(value) %>%
                 mutate(.x, cat = .))
# A tibble: 40 x 6
#   Samples  patient.vital_status years Genes  value cat  
#   <fct>                   <int> <dbl> <fct>  <dbl> <chr>
# 1 Sample1                     0  3.91 A1BG  0.0345 high 
# 2 Sample2                     0  1.46 A1BG  0.0177 high 
# 3 Sample3                     0  2.34 A1BG  0.0233 high 
# 4 Sample4                     0  5.01 A1BG  0.0105 high 
# 5 Sample5                     0  1.67 A1BG  0.0327 high 
# 6 Sample6                     0  1.81 A1BG  0.0376 high 
# 7 Sample7                     0  1.19 A1BG  0.0338 high 
# 8 Sample8                     1  4.69 A1BG  0      low  
# 9 Sample9                     0  2.17 A1BG  0.0200 high 
#10 Sample10                    1  1.96 A1BG  0.0124 high 
# … with 30 more rows

遍历特定的列数据，并将结果添加为R中的新列

问题描述投票：1回答：1

1个回答

最新问题

遍历特定的列数据，并将结果添加为R中的新列

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1