带有pycluster的加权聚类

Question

我设法采用一个代码段来说明如何使用PyCluster的k均值聚类算法。我希望能够加权数据点，但是不幸的是，我只能加权特征。我是否缺少某些东西？或者也许我可以使用一种技巧来使某些要点比其他要点更重要？

import numpy as np
import Pycluster as pc

points = np.asarray([
    [1.0, 20, 30, 50],
    [1.2, 15, 34, 50],
    [1.6, 13, 20, 55],
    [0.1, 16, 40, 26],
    [0.3, 26, 30, 23],
    [1.4, 20, 28, 20],
])

# would like to specify 6 weights for each of the elements in `points`
weights = np.asarray([1.0, 1.0, 1.0, 1.0])

clusterid, error, nfound = pc.kcluster(
    points, nclusters=2, transpose=0, npass=10, method='a', dist='e', weight=weights
)
centroids, _ = pc.clustercentroids(points, clusterid=clusterid)
print centroids

Answer 1

加权单个数据点不是KMeans算法的功能。这在算法定义中：在pycluster，MLlib或TrustedAnalytics中不可用。

但是，您可以添加重复的数据点。例如，如果您希望第二个数据点的计数增加一倍，则将列表更改为：

points = np.asarray([
    [1.0, 20, 30, 50],
    [1.2, 15, 34, 50],
    [1.2, 15, 34, 50],
    [1.6, 13, 20, 55],
    [0.1, 16, 40, 26],
    [0.3, 26, 30, 23],
    [1.4, 20, 28, 20],
])

Answer 2

现在，您可以在sklearn的fit方法中使用sample_weights。这是example。

带有pycluster的加权聚类

问题描述投票：4回答：2

2个回答

最新问题

带有pycluster的加权聚类

问题描述 投票：4回答：2

2个回答

最新问题

问题描述投票：4回答：2