我想创建一个以id为条件的边缘列表 -
combine = data.frame( id= c(1,1,1,2,2,2,2) , pid = c("john","tom","dick","tom","harry","dick","sick"))
desired output:
person1 person2 id
john tom 1
john dick 1
tom dick 1
tom harry 2
tom dick 2
..n so on
R中的代码是什么?
允许我稍微修改输入数据框以简化解决方案:
df <- data.frame(id = c(1,1,1,2,2,2,2),
person = c("john","tom","dick","tom","harry","dick","sick"),
stringsAsFactors = FALSE)
下一步是将数据框与自身合并:
dfe <- merge(x = df, y = df, by = "id",
suffixes = c("1", "2"))
唯一剩下的就是删除重复的边缘。这可以通过假设边缘按字母顺序排序来完成:
dfe <- dfe[dfe$person1 < dfe$person2,]
另外,我建议你阅读'igraph'包装。您要完成的大部分内容以及更多可用的内容。
我是这样做的
library(tidyverse)
combine = tibble( id= c(1,1,1,2,2,2,2) , pid = c("john","tom","dick","tom","harry","dick","sick"))
combo = function(x){
combine %>% filter(id==x) %$% t(combn(pid,2)) %>% as_data_frame()%>% add_column(id=x)}
map_df(unique(combine$id),combo)
V1 V2 id
<chr> <chr> <dbl>
1 john tom 1.
2 john dick 1.
3 tom dick 1.
4 tom harry 2.
5 tom dick 2.
6 tom sick 2.
7 harry dick 2.
8 harry sick 2.
9 dick sick 2.