选择两个或多个标签中的最大值

问题描述 投票:0回答:1

这是虚拟数据

df=structure(list(group1 = c("0.1531181", "0.1537821", "0.284066", 
"0.7549542", "0.2495559"), group2 = c("0.3116818", "0.5837542", 
"0.430886", "0.7856033", "0.6351635"), group3 = c(7.80191002743109e-17, 
2.22008198884117e-49, 4.64379480824993e-13, 0.0476184338005978, 
2.2062018808144e-39)), row.names = c("C4orf39", "FAM89A", "FMNL1", 
"CYB5R2", "CHST2"), class = "data.frame")

如您所见,'df'具有三列:group1,group2和group3。每行是基因名称。

现在,我想在R中构建一个函数,以便它将自动确定任一列中的哪个值最大,并将组号分配给新列'GeneCluster'。最终结果如下:

------------------ group1 ------------ group2 ----------- GeneCluster

Gene1 -----------(0.8)---------------(0.7)--------------- ------ 1

Gene2 ----------(-0.4)---------------(0.25)--------------- ---- 2

当然,列(组)的数量可以是2个或更多。

任何帮助将不胜感激!

r algorithm
1个回答
1
投票

这里,您必须对示例进行一些混乱才能进行测试

library(tidyverse)
df = structure(list(`group1` = c("0.1531181", "1", 
                                 "0.284066", "0.7549542", "0.2495559"), `group2` = c("0.3116818", 
                                                                                     "0.5837542", "0.430886", "0.7856033", "0.6351635")), row.names = c("C4orf39", 
                                                                                                                                                        "FAM89A", "FMNL1", "CYB5R2", "CHST2"), class = "data.frame")



df %>% 
  mutate(GeneCluster = if_else(group1 > group2,1,2))

这是我的第二次尝试应该足够概括

set.seed(42)
df %>% 
  mutate_all(as.numeric) %>% 
  mutate(group3 = group2 * rnorm(5) + .5,
         row = row_number()) %>%
  pivot_longer(-row) %>% 
  group_by(row) %>% 
  mutate(max_value = max(value),
         group_number = str_extract(name,"[:digit:]") %>% as.numeric(),
         group_max_value = if_else(value == max_value ,group_number,NA_real_)) %>%
  fill(group_max_value,.direction = c("updown")) %>%
  select(-group_number,-max_value) %>% 
  pivot_wider(names_from = name,values_from = value)
© www.soinside.com 2019 - 2024. All rights reserved.