我想让我的脚本查看给定年份的抽象列，以创建一个集群。

Question

谢谢你的时间，伙计们，请我需要你的帮助。

这个看的是所有带文字的栏目，应该是只看抽象的栏目，按每个年份来得出聚类。所以这5个群组应该只根据摘要栏。而每一组迭代都应该只看那特定年份的抽象。根据结果，我不能判断它是否只看1年，第二你可以告诉它是使用集群中的所有列，如出版物和年份列，因为一些集群有1999年在它和ieee。这些词在当年的摘要中并不常见。所以我只需要这个更新，看某一年的摘要栏。

vec = TfidfVectorizer(tokenizer=textblob_tokenizer,
                      stop_words='english',
                      norm='l1',
                      max_features=1000,
                      use_idf=True)
km = KMeans(n_clusters=5)

matrix = vec.fit_transform(abstract_1999)

km.fit(matrix)
km.predict(matrix)
print("Top 8 words per cluster:")
order_centroids = km.cluster_centers_.argsort()[:, ::-1]
terms = vec.get_feature_names()

# reduce the features to 2D
pca = PCA(n_components=2, random_state=0)


reduced_features = pca.fit_transform(features.toarray())

# reduce the cluster centers to 2D
reduced_cluster_centers = pca.transform(km.cluster_centers_)


for i in range(5):
    top_eight_words = [terms[ind] for ind in order_centroids[i, :8]]
    print("Cluster {}: {}".format(i+1, '  '.join(top_eight_words)))

我想让我的脚本查看给定年份的抽象列，以创建一个集群。

问题描述投票：0回答：0

最新问题

我想让我的脚本查看给定年份的抽象列，以创建一个集群。

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0