如何使用R从具有几列的数据帧中计算（共现）矩阵？

Question

我是R的新手，目前正在以32列和大约200.000行的边缘列表的形式来处理协作数据。我想基于国家之间的相互作用来创建（共）发生矩阵。但是，我想通过一个对象的总数来计算交互次数。

理想结果的基本示例

如果一行中“英格兰”出现三次，而“中国”仅出现一次，则结果应为以下矩阵。

         England  China
England    3        3
China      3        1

可复制的示例

df <- data.frame(ID = c(1,2,3,4), 
 V1 = c("England", "England", "China", "England"),
 V2 = c("Greece", "England", "Greece", "England"),
V32 = c("USA", "China", "Greece", "England"))

因此，示例数据帧当前看起来像这样：

ID  V1       V2       ...   V32
1   England  Greece         USA
2   England  England        China
3   China    Greece         Greece
4   England  England        England
.
.
.

期望的结果

我想逐行计数（共）发生，并且与顺序无关，以获得一个（共）发生矩阵，该矩阵说明了边沿环的低频（例如，英格兰-英格兰），从而得出以下结果：] >

         China   England   Greece   USA

China    2        2         2        0

England  2        6         1        1

Greece   2        1         3        1

USA      0        1         1        1
到目前为止已经尝试过的内容

我已经使用igraph来获得具有共现的邻接矩阵。但是，它计算（如预期的那样）同一两个对象的交互次数不超过两次，因此在某些情况下，按行/发布，我得到的值远远低于对象的实际频率。

df <- data.frame(ID = c(1,2,3,4), 
 V1 = c("England", "England", "China", "England"),
 V2 = c("Greece", "England", "Greece", "England"),
V32 = c("USA", "China", "Greece", "England"))

# remove ID column

df[1] <- list(NULL)

# calculate co-occurrences and return as dataframe

library(igraph)
library(Matrix)

countrydf <- graph.data.frame(df)
countrydf2 <- as_adjacency_matrix(countrydf, type = "both", edges = FALSE)
countrydf3 <- as.data.frame(as.matrix(forceSymmetric(countrydf2)))

         China   England   Greece   USA

China    0        0         1        0

England  0        2         1        0

Greece   1        1         0        0

USA      0        0         0        0
我认为必须使用base和/或dplyr和/或table和/或reshape2来简单解决，类似于[1]，[2]，[3]，[4]或[5]，但到目前为止还没有完成任何操作，因此我无法根据需要调整代码。我也尝试使用[6]作为基础，但是，同样的问题也适用于此。

library(tidry)
library(dplyr)
library(stringr)


# collapse observations into one column

df2 <- df %>% unite(concat, V1:V32, sep = ",")

# calculate weights

df3 <- df2$concat %>%
  str_split(",") %>%
  lapply(function(x){
    expand.grid(x,x,x,x, w = length(x), stringsAsFactors = FALSE)
  }) %>%
  bind_rows

df4 <- apply(df3[, -5], 1, sort) %>%
  t %>%
  data.frame(stringsAsFactors = FALSE) %>%
  mutate(w = df3$w)
如果有人能指出正确的方向，我会很高兴。

我是R的新手，目前正在以32列和大约200.000行的边缘列表的形式来处理协作数据。我想基于交互作用创建一个（共）出现矩阵...

Answer 1

2
投票

也许有更好的方法，但是尝试：

Answer 2

这里是使用dplyr和tidyr软件包的一种方法，整个思想在于创建一个数据框，每个国家都按行顺序出现，然后将其自身连接起来。

Answer 3

0
投票

使用base::table的选项：

如何使用R从具有几列的数据帧中计算（共现）矩阵？

问题描述投票：6回答：3

理想结果的基本示例

可复制的示例

期望的结果

到目前为止已经尝试过的内容

3个回答

最新问题

如何使用R从具有几列的数据帧中计算（共现）矩阵？

问题描述 投票：6回答：3

理想结果的基本示例

可复制的示例

期望的结果

到目前为止已经尝试过的内容

3个回答

最新问题

问题描述投票：6回答：3