我有一个data.table
,其中有两列,一列是groupID
,另一列是color
。我想找到所有组之间的相交长度或成对相交操作。在线上有类似的帖子,但没有什么比我想要的要精确。
require(data.table)
set.seed(1)
x <- data.table(
groupID = paste0(sample(LETTERS), sample(LETTERS, replace = TRUE)),
color = sapply(1:length(LETTERS), function(x) sample(colors()[1:10])[1:sample(5:10)[1]])
)
x <- x[, .(color = unlist(color)), keyby = groupID]
下表没有正确的值,但看起来像这样:
groups <- x[, .N, keyby = groupID][,groupID]; results <- CJ(groups, groups)
results[, intersectionLength := sapply(1:nrow(results), function(x) sample(5:10)[1])]
编辑
This post has a similar question.如何将其应用于我的问题?
[这里是一个与Map
相对应的最优解,用于比较组列的成对元素以提取intersect
ing'color'值并获取其length
library(data.table)
CJ(group1 = unique(x$groupID), group2 = unique(x$groupID))[,
.(group1, group2, intersectionLength = unlist(Map(function(u, v)
length(intersect(x$color[x$groupID == u],
x$color[x$groupID == v])), group1, group2)))]
这里是在组被翻转的情况下删除重复项的另一种选择:
ans <- x[x, on=.(color), allow.cartesian=TRUE][groupID!=i.groupID,
.(intersectionLength=uniqueN(color)),
.(g1=pmin(groupID, i.groupID), g2=pmax(groupID, i.groupID))]
输出:
g1 g2 intersectionLength
1: AT AT 6
2: AT CZ 3
3: AT DO 6
4: AT EW 5
5: AT FT 4
---
347: XL YL 3
348: XL ZF 6
349: YL YL 5
350: YL ZF 3
351: ZF ZF 7