SciPy的联动格式

Question

我已经写我自己的集群处理例程，并想产生一个树状图。要做到这一点最简单的方法是使用SciPy的聚类功能。然而，这需要输入是在该联动SciPy的函数产生的相同的格式。我找不到这个输出是如何格式化的例子。我想知道有人在那里能否赐教。

Answer 1

这是从scipy.cluster.hierarchy.linkage()函数文档，我认为这是输出格式相当明确的说明：

A（N-1）由4矩阵Z被返回。在第i次迭代，具有索引ž簇[I，0]和Z [I，1]被组合以形成簇N + I。与小于n的索引的群集对应于原始观测之一。簇之间的距离Z [I，0]和Z [I，1]由Z给出[I，2]。第四值Z [I，3]表示新形成的集群中的原始观测值的数目。

你需要更多的东西？

Answer 2

我https://stackoverflow.com/users/1167475/mortonjt同意该文件资料并不能完全解释中间簇索引，虽然我与https://stackoverflow.com/users/1354844/dkar的格式，否则解释恰恰同意。

从这个问题使用示例数据：Tutorial for scipy.cluster.hierarchy

A = np.array([[0.1,   2.5],
              [1.5,   .4 ],
              [0.3,   1  ],
              [1  ,   .8 ],
              [0.5,   0  ],
              [0  ,   0.5],
              [0.5,   0.5],
              [2.7,   2  ],
              [2.2,   3.1],
              [3  ,   2  ],
              [3.2,   1.3]])

甲联动基质可以使用单（即，最接近的匹配点）来构建：

z = hac.linkage(a, method="single")

 array([[  7.        ,   9.        ,   0.3       ,   2.        ],
        [  4.        ,   6.        ,   0.5       ,   2.        ],
        [  5.        ,  12.        ,   0.5       ,   3.        ],
        [  2.        ,  13.        ,   0.53851648,   4.        ],
        [  3.        ,  14.        ,   0.58309519,   5.        ],
        [  1.        ,  15.        ,   0.64031242,   6.        ],
        [ 10.        ,  11.        ,   0.72801099,   3.        ],
        [  8.        ,  17.        ,   1.2083046 ,   4.        ],
        [  0.        ,  16.        ,   1.5132746 ,   7.        ],
        [ 18.        ,  19.        ,   1.92353841,  11.        ]])

作为文档解释下面n中的簇（在此：11）是简单地在原始矩阵A.中间簇前进，依次进行索引的数据点。

因此，簇7和9（第一合并）合并成11组，簇4和6到12中。然后观察线3，合并的簇5（从A）和12（来自未示出的中间簇12）得到的一聚类内的0.5距离（WCD）。单方法需要新的WCS是0.5，这是在簇12 A [5]和最近点之间的距离，A [4]和A [6]。让我们检查：

 In [198]: norm([a[5]-a[4]])
 Out[198]: 0.70710678118654757
 In [199]: norm([a[5]-a[6]])
 Out[199]: 0.5

此集群现在应该中间簇13，其随后被合并与A [2]。因此，新的距离应该是点A [2]和A [4,5,6]之间的最近。

 In [200]: norm([a[2]-a[4]])
 Out[200]: 1.019803902718557
 In [201]: norm([a[2]-a[5]])
 Out[201]: 0.58309518948452999
 In [202]: norm([a[2]-a[6]])
 Out[202]: 0.53851648071345048

其中，作为可以看出还检查出，并解释的新簇的中间格式。

Answer 3

该SciPy的文档是准确的，因为嘎指出...但它是一个有点难以打开返回的数据到的东西，是进一步分析使用。

在我看来，他们应该包括在像数据结构的树返回数据的能力。下面的代码将遍历矩阵，并建立一个树：

from scipy.cluster.hierarchy import linkage
import numpy as np

a = np.random.multivariate_normal([10, 0], [[3, 1], [1, 4]], size=[100,])
b = np.random.multivariate_normal([0, 20], [[3, 1], [1, 4]], size=[50,])
centers = np.concatenate((a, b),)

def create_tree(centers):
    clusters = {}
    to_merge = linkage(centers, method='single')
    for i, merge in enumerate(to_merge):
        if merge[0] <= len(to_merge):
            # if it is an original point read it from the centers array
            a = centers[int(merge[0]) - 1]
        else:
            # other wise read the cluster that has been created
            a = clusters[int(merge[0])]

        if merge[1] <= len(to_merge):
            b = centers[int(merge[1]) - 1]
        else:
            b = clusters[int(merge[1])]
        # the clusters are 1-indexed by scipy
        clusters[1 + i + len(to_merge)] = {
            'children' : [a, b]
        }
        # ^ you could optionally store other info here (e.g distances)
    return clusters

print create_tree(centers)

Answer 4

这里是另一段代码执行相同的功能。此版本跟踪每个簇（节点ID）的距离（尺寸），并且确认构件的数量。

这使用了SciPy的联动（）函数，它是聚集方聚类器的相同的基础。

from scipy.cluster.hierarchy import linkage
import copy
Z = linkage(data_x, 'ward')

n_points = data_x.shape[0]
clusters = [dict(node_id=i, left=i, right=i, members=[i], distance=0, log_distance=0, n_members=1) for i in range(n_points)]
for z_i in range(Z.shape[0]):
    row = Z[z_i]
    cluster = dict(node_id=z_i + n_points, left=int(row[0]), right=int(row[1]), members=[], log_distance=np.log(row[2]), distance=row[2], n_members=int(row[3]))
    cluster["members"].extend(copy.deepcopy(members[cluster["left"]]))
    cluster["members"].extend(copy.deepcopy(members[cluster["right"]]))
    clusters.append(cluster)

on_split = {c["node_id"]: [c["left"], c["right"]] for c in clusters}
up_merge = {c["left"]: {"into": c["node_id"], "with": c["right"]} for c in clusters}
up_merge.update({c["right"]: {"into": c["node_id"], "with": c["left"]} for c in clusters})

SciPy的联动格式

问题描述投票：32回答：4

4个回答

最新问题

SciPy的联动格式

问题描述 投票：32回答：4

4个回答

最新问题

问题描述投票：32回答：4