如何使用 scipy.sparse.csr_matrix.min 忽略隐式零？

Question

目标

我有一个 3D 空间中大约 500K 点的列表。我想找到第一最近邻距离最大的两个坐标。

方法

我正在使用 scipy 计算稀疏距离矩阵：

from scipy.spatial import cKDTree

tree = cKDTree(points, 40)
spd = tree.sparse_distance_matrix(tree, 0.01)
spo = spd.tocsr()
spo.eliminate_zeros()

我消除了显式零以考虑对角线元素，其中计算每个点与其自身之间的距离。

我现在想找到每行/列中最小距离的坐标，它应该对应于每个点的第一个最近邻，类似于：

spo.argmin(axis=0)

通过查找该数组中元素的最大距离，我应该能够找到具有最大第一最近邻距离的两个元素。

问题

问题是

min

的

argmin

和

scipy.sparse.csr_matrix

函数也考虑了隐式零，对于这个应用程序我不希望这样做。我该如何解决这个问题？对于这个庞大的矩阵，性能和内存都是问题。或者对于我想做的事情有完全不同的方法吗？

Answer 1

我没有找到距离矩阵的解决方案，但看来我忽略了使用树的

query

方法的最明显的解决方案。

因此，为了找到第一个最近邻之间的最大距离，我所做的（向量是形状为 (N, 3) 的 numpy 数组）：

tree = cKDTree(vectors, leaf_size)
# get the indexes of the first nearest neighbor of each vertex
# we use k=2 because k=1 are the points themselves with distance 0
nn1 = tree.query(vectors, k=2)[1][:,1]
# get the vectors corresponding to those indexes. Basically this is "vectors" sorted by
# first nearest neighbor of each point in "vectors".
nn1_vec = vectors[nn1]
# the distance between each point and its first nearest neighbor
nn_dist = np.sqrt(np.sum((vectors - nn1_vec)**2, axis=1))
# maximum distance
return np.max(nn_dist)

Answer 2

万一有人后来发现这个（像我一样）。这是一个稍微简单的版本。感谢 DIN14970 对此所做的所有研究。

结果

.query

返回距离（最初询问时可能并非如此）。无需计算它们。

tree = cKDTree(vectors, leaf_size)
# get the indexes of the first nearest neighbor of each vertex
# we use k=2 because k=1 are the points themselves with distance 0
nn1_distance = tree.query(vectors, k=2)[1][0][:,1]
# maximum distance
return np.max(nn1_distance)

如何使用 scipy.sparse.csr_matrix.min 忽略隐式零？

问题描述投票：0回答：2

目标

方法

问题

2个回答

最新问题

如何使用 scipy.sparse.csr_matrix.min 忽略隐式零？

问题描述 投票：0回答：2

目标

方法

问题

2个回答

最新问题

问题描述投票：0回答：2