数据表中组之间项目的交集

问题描述 投票:2回答:2

我有一个data.table,其中有两列,一列是groupID,另一列是color。我想找到所有组之间的相交长度或成对相交操作。在线上有类似的帖子,但没有什么比我想要的要精确。

require(data.table)


set.seed(1)
x <- data.table(
  groupID = paste0(sample(LETTERS), sample(LETTERS, replace = TRUE)),
  color = sapply(1:length(LETTERS), function(x) sample(colors()[1:10])[1:sample(5:10)[1]])
)

x <- x[, .(color = unlist(color)), keyby = groupID]

下表没有正确的值,但看起来像这样:

groups <- x[, .N, keyby = groupID][,groupID]; results <- CJ(groups, groups)
results[, intersectionLength := sapply(1:nrow(results), function(x) sample(5:10)[1])]

编辑

This post has a similar question.如何将其应用于我的问题?

r data.table grouping intersection
2个回答
0
投票

[这里是一个与Map相对应的最优解,用于比较组列的成对元素以提取intersect ing'color'值并获取其length

library(data.table)
CJ(group1 = unique(x$groupID), group2 = unique(x$groupID))[,
   .(group1, group2, intersectionLength = unlist(Map(function(u, v) 
   length(intersect(x$color[x$groupID == u], 
      x$color[x$groupID == v])), group1, group2)))]

0
投票

这里是在组被翻转的情况下删除重复项的另一种选择:

ans <- x[x, on=.(color), allow.cartesian=TRUE][groupID!=i.groupID, 
    .(intersectionLength=uniqueN(color)), 
    .(g1=pmin(groupID, i.groupID), g2=pmax(groupID, i.groupID))]

输出:

     g1 g2 intersectionLength
  1: AT AT                  6
  2: AT CZ                  3
  3: AT DO                  6
  4: AT EW                  5
  5: AT FT                  4
 ---                         
347: XL YL                  3
348: XL ZF                  6
349: YL YL                  5
350: YL ZF                  3
351: ZF ZF                  7
© www.soinside.com 2019 - 2024. All rights reserved.