这是虚拟数据
df=structure(list(group1 = c("0.1531181", "0.1537821", "0.284066",
"0.7549542", "0.2495559"), group2 = c("0.3116818", "0.5837542",
"0.430886", "0.7856033", "0.6351635"), group3 = c(7.80191002743109e-17,
2.22008198884117e-49, 4.64379480824993e-13, 0.0476184338005978,
2.2062018808144e-39)), row.names = c("C4orf39", "FAM89A", "FMNL1",
"CYB5R2", "CHST2"), class = "data.frame")
如您所见,'df'具有三列:group1,group2和group3。每行是基因名称。
现在,我想在R中构建一个函数,以便它将自动确定任一列中的哪个值最大,并将组号分配给新列'GeneCluster'。最终结果如下:
------------------ group1 ------------ group2 ----------- GeneCluster
Gene1 -----------(0.8)---------------(0.7)--------------- ------ 1
Gene2 ----------(-0.4)---------------(0.25)--------------- ---- 2
当然,列(组)的数量可以是2个或更多。
任何帮助将不胜感激!
这里,您必须对示例进行一些混乱才能进行测试
library(tidyverse)
df = structure(list(`group1` = c("0.1531181", "1",
"0.284066", "0.7549542", "0.2495559"), `group2` = c("0.3116818",
"0.5837542", "0.430886", "0.7856033", "0.6351635")), row.names = c("C4orf39",
"FAM89A", "FMNL1", "CYB5R2", "CHST2"), class = "data.frame")
df %>%
mutate(GeneCluster = if_else(group1 > group2,1,2))
这是我的第二次尝试应该足够概括
set.seed(42)
df %>%
mutate_all(as.numeric) %>%
mutate(group3 = group2 * rnorm(5) + .5,
row = row_number()) %>%
pivot_longer(-row) %>%
group_by(row) %>%
mutate(max_value = max(value),
group_number = str_extract(name,"[:digit:]") %>% as.numeric(),
group_max_value = if_else(value == max_value ,group_number,NA_real_)) %>%
fill(group_max_value,.direction = c("updown")) %>%
select(-group_number,-max_value) %>%
pivot_wider(names_from = name,values_from = value)