R 中 Pearson 相关性比 Spearman 相关性快吗?

问题描述 投票:0回答:1

我想确定成对列之间的许多相关性(数百万),所以我担心计算时间。

我怀疑在 R 中计算 Pearson 相关性(基于值)比 Spearman 相关性(基于排名)更快。这是正确的吗?

请问我怎样才能知道?谢谢你。

r performance correlation pearson-correlation pearson
1个回答
0
投票

您可以使用

rbenchmark
包来实现此目的。

library(rbenchmark)

  #' 1.000 rows, 100 repetitions
  
  x1 <- rnorm(1000)
  y1 <- rnorm(1000)
  
  benchmark(
    spearman = {
      cor(x1, y1, method = "spearman")
    },
    pearson = {
      cor(x1, y1, method = "pearson")
    },
    replications = 100
  )
#>       test replications elapsed relative user.self sys.self user.child
#> 2  pearson          100   0.001        1     0.002    0.000          0
#> 1 spearman          100   0.013       13     0.013    0.001          0
#>   sys.child
#> 2         0
#> 1         0
  
  # 1.000.000 rows, 100 repititions
  
  x2 <- rnorm(1000000)
  y2 <- rnorm(1000000)
  
  benchmark(
    spearman = {
      cor(x2, y2, method = "spearman")
    },
    pearson = {
      cor(x2, y2, method = "pearson")
    },
    replications = 100
  )
#>       test replications elapsed relative user.self sys.self user.child
#> 2  pearson          100   0.726     1.00     0.725    0.001          0
#> 1 spearman          100  43.974    60.57    43.341    0.605          0
#>   sys.child
#> 2         0
#> 1         0
  
  #' This confirms you assumption: Pearson is significantly faster than Spearman.
  #' Especially when the rows/cases are increased, spearman becomes slower.
© www.soinside.com 2019 - 2024. All rights reserved.