科学树状图

问题描述 投票:0回答:1

我正在使用分层文档集群,实际上我的工作流程几乎是这样:

df = pandas.read_csv(file, delimiter='\t', index_col=0) # documents-terms matrix (very sparse)
dist_matrix = cosine_similarity(df)

linkage_matrix = ward(dist_matrix)
labels = fcluster(linkage_matrix, 5, criterion='maxclust')

然后我希望获得5个簇,但是当我绘制树状图时

fig, ax = plt.subplots(figsize=(15, 20))  # set size
    ax = dendrogram(linkage_matrix, orientation="right")
    plt.tick_params( \
        axis='x',  # changes apply to the x-axis
        which='both',  # both major and minor ticks are affected
        bottom='off',  # ticks along the bottom edge are off
        top='off',  # ticks along the top edge are off
        labelbottom='off')

    plt.tight_layout()  # show plot with tight layout

    plt.savefig('ward_clusters.png', dpi=200)  # save figure as ward_clusters

我得到下图

enter image description here

根据颜色,我可以看到3个簇,而不是5个!我是否误解了树状图的含义?

python scikit-learn scipy hierarchical-clustering
1个回答
0
投票
  • 首先,如果只想创建5个群集,则只需使用标签(没有使用fcluster的行)。
© www.soinside.com 2019 - 2024. All rights reserved.