我实现了一个函数,用于查找运行 K 均值聚类算法后计算出的每个质心的“最近”数据点。我想知道是否有一个 sklearn
函数可以让我找到距离每个质心最近的
M点。
来拟合我们的数据集。然后,我们可以使用 K 均值质心查询最近邻模型来检索邻居。像这样:
# Copyright 2024 Google LLC.
# SPDX-License-Identifier: Apache-2.0
from sklearn.cluster import KMeans
from sklearn.neighbors import NearestNeighbors
# random dense embeddings for 100 points with 10 dimensions.
dataset = np.random.rand(100,10)
# fit K-means with 3 clusters on our dataset.
kme = KMeans(n_clusters=3)
kme.fit(dataset)
# we should have 3 vectors for 3 centroids.
print(kme.cluster_centers_.shape) # (3, 10)
# initialize NearestNeighbor with 5 neighbors and fit our dataset.
knn = NearestNeighbors(n_neighbors=5, metric='cosine')
knn.fit(dataset)
# Use the model to query the centroids' neighbors.
distances, indices = knn.kneighbors(kme.cluster_centers_)
for centroid, distance_from_centroid, index in zip(kme.cluster_centers_, distances, indices):
print(centroid, distance_from_centroid, index)
最后一个循环将输出 3 行。每个都类似于质心的向量以及其最近邻居的 5 个距离和索引。