如何在数据集中找到行列式为零的矩阵？ - 等边距太多，无法计算

Question

我正在尝试计算评估者间的一致性（斯图尔特-麦克斯韦测试）。我已经到了一定程度了。我也发现了这个问题。然而，我在某个地方遇到了问题。如何找到数据集中的模式？为了继续分析，我需要从数据集中删除有问题的矩阵。

# 2 raters, 4 different response category and 4 subjects
R1 <- as.data.frame(expand.grid(rep(list(0:3), 4)))
R2 <- as.data.frame(expand.grid(rep(list(0:3), 4)))

# nrow(R1)*nrow(R2) = 256
pattern <- expand.grid(1:256, 1:256)
allPossible <- vector(mode = "list", length = nrow(pattern))
for(i in 1:nrow(pattern)){
  allPossible[[i]] <- rbind(R1[pattern[i, 1], ], R2[pattern[i, 2], ])
}

# 256*256 = 65536 possible rating
allPossible

# cross-tabulation
levs <- c(0:2)
list1 <- list()
list2 <- list()
list_A <- list()

for (i in 1:65536) {
  list1[[i]] <- factor(allPossible[[i]][1, ], levels = levs)
  list2[[i]] <- factor(allPossible[[i]][2, ], levels = levs)
  list_A[[i]] <- xtabs(~list1[[i]] + list2[[i]])
}

list_A

# 256 out of 65536 are completely same ratings. I have to delete them. (eg. check allPossible[1], allPossible[258])
list_B <- list_A[-seq(from = 1, to = 65536, by = 257)]

# Stuart-Maxwell Test
install.packages("DescTools")
library(DescTools)

# Now, I have 65280 different matrices
outputSM <- list()

for (i in 1:65280) {
  outputSM[[i]] <- StuartMaxwellTest(list_B[[i]])

}

outputSM

# it didn't work. Because, there are still some 3x3 matrices with the determinant 0 (zero).

# eg. problematic 3x3 matrices
StuartMaxwellTest(list_A[[260]])
StuartMaxwellTest(list_A[[820]])

# yes I found the source of some of problematic matrices 
allPossible[260]
allPossible[820]

Answer 1

问题不一定只是行列式等于零。有些矩阵的行列式为零，但检验仍然有效。以

list_A

的第二个元素为例：

> det(as.matrix(list_A[[2]]))
[1] 0
> StuartMaxwellTest(list_A[[2]])

    Stuart-Maxwell test

data:  list_A[[2]]
chi-squared = 1, df = 2, p-value = 0.6065

可以在源代码的第 25-31 行看到测试正在运行的检查。

  rowsums <- rowSums(x)
  colsums <- colSums(x)
  equalsums <- diag(x) == rowsums & diag(x) == colsums
  if (any(equalsums)) {
    x <- x[!equalsums, !equalsums]
    if (dim(x)[1] < 2) 
      stop("Too many equal marginals, cannot compute")

要提前弄清楚哪些是有效的，您可以编写一个刚刚执行该测试的小函数：

validSM <- function(x){
  rowsums <- rowSums(x)
  colsums <- colSums(x)
  equalsums <- diag(x) == rowsums & diag(x) == colsums
  if (any(equalsums)) {
    x <- x[!equalsums, !equalsums]
  }
  dim(x)[1] >= 2
  
}

然后，您可以将该函数应用于您的矩阵列表：

> valid <- sapply(list_A, validSM)
> table(valid)
valid
FALSE  TRUE 
10000 55536

这表示还有 10000 个其他矩阵无法通过测试。然后您可以只使用有效的：

list_A_small <- list_A[which(valid)]
pattern_small <- pattern[which(valid), ]

outputSM_small <- list()
for (i in seq_along(list_A_small)) {
  outputSM_small[[i]] <- StuartMaxwellTest(list_A_small[[i]])
}

另一个也许更简单的选择是使用

tryCatch()

捕获错误，并在无效时返回缺失值。例如：

outputSM <- list()
for (i in seq_along(list_A)) {
  outputSM[[i]] <- tryCatch(
    StuartMaxwellTest(list_A[[i]]), 
    error = function(x)NA)
}

如何在数据集中找到行列式为零的矩阵？ - 等边距太多，无法计算

问题描述投票：0回答：1

1个回答

最新问题

如何在数据集中找到行列式为零的矩阵？ - 等边距太多，无法计算

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1