在 scikit-learn 的凝聚聚类算法中,如何获得所有中间簇?

问题描述 投票:0回答:1

我正在运行这个相对简单的算法。

如果我正确理解算法,如果你聚类到 8 个簇,那么你应该得到 8 个以上的所有簇的结果,对吗?

您实际上必须多次运行代码吗?或者您将如何检索中间聚类?

%%time
for k in K:
    start_time = time.time()  # Start timing
    
    s[k] = []
    db[k] = []
    
    np.random.seed(123456)  # for reproducibility
    model = AgglomerativeClustering(linkage='ward', connectivity=w.sparse, n_clusters=k)
    y = model.fit(cont_std)
    cont_std_['AHC_k'+ str(k)] = y.labels_
    
    silhouette_score = metrics.silhouette_score(cont_std, y.labels_, metric='euclidean')
    print('silhouette at k=' + str(k) + ': ' + str(silhouette_score))
    s[k].append(silhouette_score)
    
    davies_bouldin_score = metrics.davies_bouldin_score(cont_std, y.labels_)
    print(f'davies bouldin at k={k}: {davies_bouldin_score}')
    db[k].append(davies_bouldin_score)
    
    end_time = time.time()  # End timing
    print(f"Time for k={k}: {end_time - start_time} seconds")  # Print the duration for the cycle
python machine-learning scikit-learn cluster-analysis hierarchical-clustering
1个回答
0
投票

这可能是一个相当迂回的方法,但它似乎有效。我稍后可能会尝试清理它。

# Generate the list of nodes throughout the process,
# and an array that for each node index indicates the iteration
# at which it got merged with another.
nodes = [[i] for i in range(len(X))]
merged_at_stage = -np.ones(len(X) + len(model.children_), dtype=int)
for i, merge in enumerate(model.children_):
    a, b = merge
    nodes.append(nodes[a] + nodes[b])
    merged_at_stage[a] = i
    merged_at_stage[b] = i

# For a fixed number of clusters, identify the nodes
# at that point in the process
N_CLUSTERS = 2
clusters = [
    nodes[i] 
    for i, x in enumerate(merged_at_stage)
    if (
        x >= len(X) - N_CLUSTERS  # the node hasn't already been merged with another
        and i <= len(X) + len(model.children_) - N_CLUSTERS  # the node has already been created
    )
]
© www.soinside.com 2019 - 2024. All rights reserved.