我正在使用 chatintents (https://github.com/dborrelli/chat-intents) 进行自动聚类。为了嵌入句子,我使用句子转换器。问题是当我设置最大和最小簇数然后运行时,它找到的簇数更高。
代码:
X = model.encode(utterances["FCD_COG_INPUT_TEXT"].to_list())
hspace = {
"n_neighbors": hp.choice('n_neighbors', range(3,16)),
"n_components": hp.choice('n_components', range(100,115)),
"min_cluster_size": hp.choice('min_cluster_size', range(50,65)),
"random_state": 42
}
label_lower = 20
label_upper = 30
max_evals = 100
best_params_use, best_clusters_use, trials_use = bayesian_search(X,
space=hspace,
label_lower=label_lower,
label_upper=label_upper,
max_evals=max_evals)
结果:
100%|██████████| 100/100 [59:49<00:00, 35.90s/trial, best loss: 0.15540102619497703]
best:
{'min_cluster_size': 51, 'n_components': 106, 'n_neighbors': 7, 'random_state': 42}
label count: 3
在本例中,有 3 个集群。但有时超过 100