我有N个浮点向量,我想计算它们之间的成对归一化L2距离。对于向量u和v,归一化的L2距离定义为:|| u / ||u||_2 - v / ||v||_2 ||_2, where || ... ||_2 is the L2 norm (i.e. square root of the sum of squares)
我写了一个表示成对距离矩阵的类:
class PairwiseDistanceMatrix {
private let count: Int
private let buffer: [Float]
private let bufferPointer: UnsafeBufferPointer<Float>
let rows: Int
let columns: Int
init(with vectors: [[Float]]) {
count = vectors.count
let len = vDSP_Length(vectors[0].count)
var norm: Float = .nan
var divRes = [Float](repeating: .nan, count: Int(len))
// Normalizing the vectors:
let norms = Array(0..<count).map { (index) -> [Float] in
vDSP_svesq(vectors[index], 1, &norm, len)
norm = norm.squareRoot()
vDSP_vsdiv(vectors[index], 1, &norm, &divRes, 1, len)
return divRes
}
var mutableBuffer = [Float](repeating: 0.0, count: count * count)
let mutableBufferPointer = UnsafeMutableBufferPointer<Float>.init(start: &mutableBuffer, count: mutableBuffer.count)
// Computing the distances between the normalized vectors
var distancesq: Float = .nan
for i in 0..<(count - 1) {
for j in i..<count {
let index = i * count + j
let symetricIndex = j * count + i
vDSP_distancesq(norms[i], 1, norms[j], 1, &distancesq, len)
mutableBufferPointer[index] = distancesq.squareRoot()
mutableBufferPointer[symetricIndex] = mutableBufferPointer[index]
}
}
buffer = mutableBuffer
bufferPointer = UnsafeBufferPointer<Float>.init(mutableBufferPointer)
rows = count
columns = count
}
func elementAt(row: Int, column: Int) -> Float {
return bufferPointer[row * count + column]
}
}
[当前,对于5000个向量,此代码在〜600毫秒内在iPhone X上运行。正如预期的那样,大多数都采用了嵌套循环。 (向量归一化所需的时间少于2ms)。我很确定此代码可以优化。欢迎任何想法或指示。
让我们忽略标准化,因为这不是问题,并且您已经有了一个相当有效的解决方案。
假设向量已经归一化,那么我们要填写第ij个条目是|| v_i-v_j ||的矩阵。与您当前的蛮力方法相比,这可以更有效地完成。观察到
||v_i - v_j||^2 = (v_i - v_j)*(v_i - v_j)
= v_i * v_i - 2 v_i * v_j + v_j * v_j
= ||v_i||^2 + ||v_j||^2 - 2 v_i * v_j
= 2(1 - v_i * v_j)
因此,我们需要计算成对的点积矩阵,然后将f(x)= sqrt(2-2 * x)应用于矩阵的每个条目。
成对点积的矩阵正是将向量作为矩阵A的行打包在一起并计算A * transpose(A)时得到的,这可以使用BLAS提供的SSYRK
函数来完成。