我有类似数据
mydf <- data.frame(p1=c('a','a','a','b','b','b','c','c','d'),
p2=c('b','c','d','c','d','e','d','e','e'),
p3=c('a','a','c','c','d','d','d','a','a'),
p4=c('a','a','b','c','c','e','d','a','b'),
p5=c('a','b','c','d','e','b','b','c','c'),
source=c('a','b','c','d','e','e','a','b','d'))
哪个给出:
p1 p2 p3 p4 p5 source
1 a b a a a a
2 a c a a b b
3 a d c b c c
4 b c c c d d
5 b d d c e e
6 b e d e b e
7 c d d d b a
8 c e a a c b
9 d e a b c d
我想创建两个邻接矩阵,作为源到其余列之间的连接数。例如:
a b c d e
a 4 2
b 5 1
c 1 1
d 1 2
e 0 3
有什么方法可以轻松做到这一点。希望得到帮助
我们可以基于count
获取长格式的数据source
,然后再次获取宽格式的数据。
library(dplyr)
library(tidyr)
mydf %>%
pivot_longer(cols = -source) %>%
count(source, value) %>%
pivot_wider(names_from = value, values_from = n, values_fill = list(n = 0))
# source a b c d e
# <fct> <int> <int> <int> <int> <int>
#1 a 4 2 1 3 0
#2 b 5 1 3 0 1
#3 c 1 1 2 1 0
#4 d 1 2 4 2 1
#5 e 0 3 1 3 3