我正在尝试对我的 Faiss 向量进行聚类
vector_store = FAISS.load_local("embeddings_of_songs", embeddings=embeddings)
但我在谷歌上找到的只是 faiss approximation_search
我正在尝试为
vector_store
进行kmeans聚类,任何帮助将不胜感激,谢谢!
首先,您需要从向量存储中提取嵌入:
vectorstore_data = vector_store.get(include=["embeddings", "metadatas"])
embs = vectorstore_data["embeddings"]
然后使用 sklearn 应用聚类:
# use a predefined number of clusters
num_clusters = 5
# Perform k-means clustering
kmeans = KMeans(n_clusters=num_clusters, random_state=0)
cluster_assignments = kmeans.fit_predict(embs)
请注意是否有一些标签(例如在元数据的来源中)。您可以使用 ARI 检查集群的质量(请参阅文档)。这里我假设
categories = ...
# Evaluate clustering using adjusted Rand index (ARI)
ari = adjusted_rand_score(categories, cluster_assignments)
print(f"Cluster Assignments: {cluster_assignments}")
print(f"Adjusted Rand Index (ARI): {ari}")