我想确定成对列之间的许多相关性(数百万),所以我担心计算时间。
我怀疑在 R 中计算 Pearson 相关性(基于值)比 Spearman 相关性(基于排名)更快。这是正确的吗?
请问我怎样才能知道?谢谢你。
您可以使用
rbenchmark
包来实现此目的。
library(rbenchmark)
#' 1.000 rows, 100 repetitions
x1 <- rnorm(1000)
y1 <- rnorm(1000)
benchmark(
spearman = {
cor(x1, y1, method = "spearman")
},
pearson = {
cor(x1, y1, method = "pearson")
},
replications = 100
)
#> test replications elapsed relative user.self sys.self user.child
#> 2 pearson 100 0.001 1 0.002 0.000 0
#> 1 spearman 100 0.013 13 0.013 0.001 0
#> sys.child
#> 2 0
#> 1 0
# 1.000.000 rows, 100 repititions
x2 <- rnorm(1000000)
y2 <- rnorm(1000000)
benchmark(
spearman = {
cor(x2, y2, method = "spearman")
},
pearson = {
cor(x2, y2, method = "pearson")
},
replications = 100
)
#> test replications elapsed relative user.self sys.self user.child
#> 2 pearson 100 0.726 1.00 0.725 0.001 0
#> 1 spearman 100 43.974 60.57 43.341 0.605 0
#> sys.child
#> 2 0
#> 1 0
#' This confirms you assumption: Pearson is significantly faster than Spearman.
#' Especially when the rows/cases are increased, spearman becomes slower.