划分相似度矩阵

问题描述 投票:0回答:1

考虑组列表

sets <- list(
  S1=c("A", "B", "C"),
  S2=c("A", "B", "D", "E"),
  S3=c("F", "G", "H"),
  S4=c("H", "I", "J"))

我们可以建立一个矩阵,显示集合之间公共元素的数量:

S <- sapply(sets, function(.s) unlist(sets) %in% .s)
S <- crossprod(S)
   S1 S2 S3 S4
S1  5  4  0  0
S2  4  6  0  0
S3  0  0  4  2
S4  0  0  2  4

通过将矩阵划分为对称矩阵S[1:2,1:2]S[3:4,3:4],该矩阵可用于将变量聚类为两个不共享任何元素的集合。

问题:如何将对称矩阵划分为子矩阵?

r cluster-analysis partitioning
1个回答
0
投票

鉴于您从集合开始然后计算S矩阵,所以我不确定您的问题是要查找包含重叠元素的集合的超集还是关于块对角矩阵。我已经解决了一个固定问题,但是答案可以用来划分矩阵并提取子矩阵。

基本方法是创建一个函数,以查找包含集合中公共元素的超集。我们首先将此函数应用于sets,然后应用于包含公共集的超集。我已经扩展了示例问题,以显示该解决方案可以处理更复杂的集合。

sets <- list(
S1=c("A", "B", "C"),
S2=c("F", "G", "H"),
S3=c("A", "B", "D", "E"),
S4=c("N", "O", "P"),
S5= c("J","K", "L"),
S6=c("H", "I", "J"),
S7=c("M", "N","O","Q") )

S <- sapply(sets, function(.s) unlist(sets) %in% .s)
S <- crossprod(S) 

  make_super_set <- function( xset ) {
#
#  xset should be a named list of set elements
   super_sets <- list()
   for( iset in names(xset)) {
     super_sets[[iset]] <-  names(xset)[sapply(xset, function(x) any(is.element(x, xset[[iset]])))]
   } 
super_sets 
}

# Find super sets containing common set elements
  temp_set <- make_super_set(sets)
# Find super sets contiaing overlaping sets
  super_set <- make_super_set(temp_set)
# Find unique sets of super sets
  super_set <- unique(super_set)
#  partition S matrix into submatrices
  part_mat_ord <- unlist(super_set)
  S[part_mat_ord,part_mat_ord]
# make list of submatrices
  lapply(super_set, function(x) S[x,x])

给出

[[1]]
   S1 S3
S1  5  4
S3  4  6

[[2]]
   S2 S5 S6
S2  4  0  2
S5  0  4  2
S6  2  2  5

[[3]]
   S4 S7
S4  5  4
S7  4  6
© www.soinside.com 2019 - 2024. All rights reserved.