我试图了解 KNN 如何用于谱聚类。我在下面得到的亲和力信息有几个值为 0.5。
我将亲和力矩阵添加到其转置中并取其平均值,但与此代码给出的结果仍然存在一些差异。
我特别想知道下面的0.5是怎么来的。
from sklearn.cluster import SpectralClustering
import matplotlib.pyplot as plt
import numpy as np
X = np.array([[1, 0], [1, 1], [1, 2], [2,0], [2, 1],
[3, 5], [3, 6], [4, 7]])
sc = SpectralClustering(n_clusters=2, affinity='nearest_neighbors', n_neighbors=3, assign_labels='discretize', random _state=0).fit(X)
(0, 1) 1.0
(0, 3) 1.0
(0, 0) 1.0
(1, 4) 0.5
(1, 0) 1.0
(1, 2) 1.0
(1, 1) 1.0
(2, 4) 0.5
(2, 1) 1.0
(2, 2) 1.0
(3, 0) 1.0
(3, 4) 1.0
(3, 3) 1.0
(4, 2) 0.5
(4, 1) 0.5
(4, 3) 1.0
(4, 4) 1.0
(5, 7) 1.0
(5, 6) 1.0
(5, 5) 1.0
(6, 7) 1.0
(6, 5) 1.0
(6, 6) 1.0
(7, 5) 1.0
(7, 6) 1.0
(7, 7) 1.0
I added the affinity matrix to its transpose and took the average of it but there are still a few discrepancies with the result that this code gives.
亲和力矩阵中得到 0.5 的原因是因为 sklearn.SpectralClustering
实现中的这条
线:
connectivity = estimator.kneighbors_graph(X=X, mode="connectivity")
self.affinity_matrix_ = 0.5 * (connectivity + connectivity.T)
将k近邻图对应的矩阵与其转置相加,然后乘以0.5。 基本上,这条线确保生成的亲和力矩阵是对称的,这在计算特征分解(谱聚类的下一步)时具有良好的属性。特别是,得到的特征向量将是实数且正交的。
举一个具体的例子,考虑以下矩阵:
connectivity = np.array([[1., 0., 1.],
[0., 1., 1.],
[1., 0., 1.]])
其转置为:
connectivity = np.array([[1., 0., 1.],
[0., 1., 0.],
[1., 1., 1.]])
将这两个矩阵相加并乘以 0.5 得出:
connectivity = np.array([[1., 0., 1.],
[0., 1., 0.5],
[1., 0.5, 1.]])