用欧几里得距离来计算两个矩阵行之间距离矩阵的更快方法?

问题描述 投票:0回答:1

首先,这是NOT calculating Euclidean distance between two matrices的问题。

假设我有两个矩阵xy,例如,

set.seed(1)
x <- matrix(rnorm(15), ncol=5)
y <- matrix(rnorm(20), ncol=5)

where

> x
           [,1]       [,2]      [,3]       [,4]       [,5]
[1,] -0.6264538  1.5952808 0.4874291 -0.3053884 -0.6212406
[2,]  0.1836433  0.3295078 0.7383247  1.5117812 -2.2146999
[3,] -0.8356286 -0.8204684 0.5757814  0.3898432  1.1249309

> y
            [,1]       [,2]        [,3]       [,4]        [,5]
[1,] -0.04493361 0.59390132 -1.98935170 -1.4707524 -0.10278773
[2,] -0.01619026 0.91897737  0.61982575 -0.4781501  0.38767161
[3,]  0.94383621 0.78213630 -0.05612874  0.4179416 -0.05380504
[4,]  0.82122120 0.07456498 -0.15579551  1.3586796 -1.37705956

然后,我想获得尺寸为3×4的距离矩阵distmat,其中元素distmat[i,j]norm(x[1,]-y[2,],"2")dist(rbind(x[1,],y[2,]))中的值。

我的代码如下

distmat <- as.matrix(unname(unstack(within(idx<-expand.grid(seq(nrow(x)),seq(nrow(y))), d <-sqrt(rowSums((x[Var1,]-y[Var2,])**2))), d~Var2)))

给出

> distmat
         [,1]     [,2]     [,3]     [,4]
[1,] 3.016991 1.376622 2.065831 2.857002
[2,] 4.573625 3.336707 2.698124 1.412811
[3,] 3.764925 2.235186 2.743056 3.358577

但是当xy行数很大时,我认为我的代码不够优雅或效率很高。

我期待实现这个目标的基R中更快,更优雅的代码。预先感谢!

基准模板

为了方便起见,您可以使用以下内容作为基准,以查看代码是否更快:

set.seed(1)
x <- matrix(rnorm(15000), ncol=5)
y <- matrix(rnorm(20000), ncol=5)
# my customized approach
method_ThomasIsCoding <- function() {
  as.matrix(unname(unstack(within(idx<-expand.grid(seq(nrow(x)),seq(nrow(y))), d <-sqrt(rowSums((x[Var1,]-y[Var2,])**2))), d~Var2)))
}
# your approach
method_XXX <- function() {
  # fill with your approach
}
microbenchmark::microbenchmark(
  method_ThomasIsCoding(),
  method_XXX(),
  unit = "relative",
  check = "equivalent",
  times = 10
)
r performance matrix euclidean-distance
1个回答
0
投票

proxy程序包具有此功能。

library(proxy)
dist(x, y)

     [,1]     [,2]     [,3]     [,4]    
[1,] 3.016991 1.376622 2.065831 2.857002
[2,] 4.573625 3.336707 2.698124 1.412811
[3,] 3.764925 2.235186 2.743056 3.358577

-1
投票

解决方案:既优雅又快5倍]

euclidean_distance <- function(p,q){
  sqrt(sum((p - q)^2))
}

distmat = outer(
    as.data.frame(t(x)),
    as.data.frame(t(y)),
    Vectorize(euclidean_distance)
)

输出:

> distmat
         V1       V2       V3       V4
V1 3.016991 1.376622 2.065831 2.857002
V2 4.573625 3.336707 2.698124 1.412811
V3 3.764925 2.235186 2.743056 3.358577

基准:

set.seed(1)
x <- matrix(rnorm(1500), ncol=5)
y <- matrix(rnorm(2000), ncol=5)
# my customized approach
method_ThomasIsCoding <- function() {
  as.matrix(unname(unstack(within(idx<-expand.grid(seq(nrow(x)),seq(nrow(y))), d <-sqrt(rowSums((x[Var1,]-y[Var2,])**2))), d~Var2)))
}
# your approach
method_Jet <- function() {
  # fill with your approach
  outer(as.data.frame(t(x)),as.data.frame(t(y)),Vectorize(euclidean_distance))
}
microbenchmark::microbenchmark(
  method_ThomasIsCoding(),
  method_Jet(),
  unit = "relative",
  check = "equivalent",
  times = 1
)

输出:

                     expr      time
1 method_ThomasIsCoding()  68785152
2            method_Jet() 368550933
最新问题
© www.soinside.com 2019 - 2024. All rights reserved.