如何选择以k均值表示最大频率的聚类

Question

我从Gensim word2vec创建了一个k均值簇，其中k的值为3。现在，我想检索频率最高的群集和值。

import gensim
from gensim.models import Word2Vec
import nltk
from nltk.tokenize import sent_tokenize
from sklearn.cluster import KMeans
import numpy as np
text = "Thank you for keeping me updated on this issue. I'm happy to hear that the issue got resolved after all and you can now use the app in its full functionality again. Also many thanks for <pre> your suggestions. We hope to improve this feature in the future. In case you experience any <pre> further problems with the app, please don't hesitate to contact me again."
sentences = sent_tokenize(text)
word_text = [[text for text in sentences.split()] for sentences in sentences]
model = Word2Vec(word_text, min_count=1)
x = model[model.wv.vocab]
n_clusters = 3
kmeans = KMeans(n_clusters=n_clusters)
kmeans = kmeans.fit(x)

Answer 1

您可以找到每个数据点的标签：

labels = kmeans.labels _

现在您可以使用以下方法在每个聚类中找到样本数量：

np.unique（标签，return_counts = True）

您可以使用以下方法找到聚类中心kmeans.cluster_centers _

如何选择以k均值表示最大频率的聚类

问题描述投票：0回答：1

1个回答

最新问题

如何选择以k均值表示最大频率的聚类

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1