在大型数据集中进行分组成对比较的更有效方法?

问题描述 投票:0回答:1

我的数据如下所示:

Tab4 <- read.table(text = "
  nodepair  `++`  `--`  `+-`  `-+`  `0+`  `+0`  `0-`  `-0`  `00` ES   
1 A1_A1        0     4     0     0     0     0     0     0    16 3    
2 A1_A1        0     5     0     0     0     0     0     0    16 4    
3 A1_A1        0     5     0     0     0     0     0     0    15 5    
", header = TRUE)

我已经编写了这段代码,以便每个组“ES”通过节点对进行成对比较:

ES_combs <- combn(unique(Tab4$ES), 2, simplify = FALSE)

Tab5 <- Tab4 %>%                            ########### compare every pair to eachother
  group_split(nodepair) %>% 
  map(.f = function(df) df %>%
        map(.x = 1:length(ES_combs),
            .f = ~df %>% 
              filter(ES %in% ES_combs[[.x]]) %>% 
              summarize(nodepair = first(nodepair),
                        ES_1 = ES[1],
                        ES_2 = ES[2], 
                        across(2:10, ~as.numeric(.))))) %>%
  bind_rows()

结果是:

Tab5 <- read.table(text = "
  nodepair ES_1  ES_2   `++`  `--`  `+-`  `-+`  `0+`  `+0`  `0-`  `-0`  `00`
1 A1_A1    3     4         0     4     0     0     0     0     0     0    16
2 A1_A1    3     4         0     5     0     0     0     0     0     0    16
3 A1_A1    3     5         0     4     0     0     0     0     0     0    16
4 A1_A1    3     5         0     5     0     0     0     0     0     0    15
5 A1_A1    4     5         0     5     0     0     0     0     0     0    16
6 A1_A1    4     5         0     5     0     0     0     0     0     0    15    
", header = TRUE)

这可行,但当我比较完整数据集时需要很长时间。我希望有更有效的代码?我怀疑我收到的这个警告暴露了部分问题:

Warning messages:
  1: Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()` always returns an ungrouped
data frame and adjust accordingly.

但我不知道从这里该去哪里。

r comparison pairwise long-running-task
1个回答
0
投票

我们可以进行内部联接并删除重复项:

out <- merge(Tab4,Tab4[,c('nodepair','ES')],by='nodepair',suffixes=c("1","2"),all=T)
out[out$ES1!=out$ES2,]

  nodepair X.... X.....1 X.....2 X.....3 X.0.. X..0. X.0...1 X..0..1 X.00. ES1 ES2
2    A1_A1     0       4       0       0     0     0       0       0    16   3   4
3    A1_A1     0       4       0       0     0     0       0       0    16   3   5
4    A1_A1     0       5       0       0     0     0       0       0    16   4   3
6    A1_A1     0       5       0       0     0     0       0       0    16   4   5
7    A1_A1     0       5       0       0     0     0       0       0    15   5   3
8    A1_A1     0       5       0       0     0     0       0       0    15   5   4
© www.soinside.com 2019 - 2024. All rights reserved.