在我的 R 脚本中,我有一个图形对象“航班”,然后我使用以下代码为边缘分配一个属性“类型”:
stats <- summary(E(graph)$weight)
# 1st threshfirstThresh <- as.double(stats["1st Qu."])
firstThresh
# 2nd thresh
secondThresh <- as.double(stats["3rd Qu."])
for (i in 1:length(E(flights))){
if(E(graph)[i]$weight < firstThresh)
E(graph)[i]$type <- "C"
else if (E(graph)[i]$weight < secondThresh)
E(graph)[i]$type <- "M"
else
E(graph)[i]$type <- "L"
cat(i , " - ")
}
为什么使用这段代码,如果我使用另一个具有更多节点和边的图,“for”循环的单次迭代真的慢得多?
特别是,我通过这种方式做了一个简单的基准测试:
start.time <- Sys.time()
...Relevent codes...
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken
这些是 200 次循环迭代 在两个图上的结果:
同样的代码为什么会有这么大的差异?
请尝试以下方法,如果更快,请告诉我们。我没有对这个进行基准测试,因为我没有你的图表。
> g <- make_(ring(10), with_edge_(weight=1:10))
> E(g)$weight
[1] 1 2 3 4 5 6 7 8 9 10
> E(g)$kind <- 'C'
> E(g)[weight < 3]$kind <- 'A'
> E(g)[3 <= weight & weight < 5]$kind <- 'B'
> E(g)$kind
[1] "A" "A" "B" "B" "C" "C" "C" "C" "C" "C"
谨慎使用
type
作为属性名称。当应用于顶点时,它应该是逻辑类型的,因为它有一个特殊的解释作为编码二分图的两个分区。
关于
igraph
效率见:https://igraph.discourse.group/t/igraph-is-much-slower-than-networkx-when-generating-a-graph/853.
向量化编码要快得多:
set.seed(2023)
library(igraph)
firstThresh <- 20
secondThresh <- 80
system.time({
n <- 6072; m <- 66923
graph <- sample_gnm(n, m)
E(graph)$weight <- round(runif(m) * 100)
E(graph)$type <- "L"
E(graph)$type[which(E(graph)$weight < secondThresh)] <- "M"
E(graph)$type[which(E(graph)$weight < firstThresh)] <- "C"
})