python:矢量化累积计数

问题描述 投票:1回答:1

我有一个numpy数组,并且想以累积的方式计算每个值的出现次数

in  = [0, 1, 0, 1, 2, 3, 0, 0, 2, 1, 1, 3, 3, 0, ...]
out = [0, 0, 1, 1, 0, 0, 2, 3, 1, 2, 3, 1, 2, 4, ...]

我想知道是否最好用col = i和row = in [i]创建一个(稀疏)矩阵

       1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0
       0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0
       0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0
       0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0

然后我们可以计算沿着行的cumsum并从cumsum增量的位置提取数字。

但是,如果我们收集稀疏矩阵,不会变得密集?这样做有效吗?

arrays numpy vectorization counting cumsum
1个回答
2
投票

这是使用sorting的一种矢量化方法 -

def cumcount(a):
    # Store length of array
    n = len(a)

    # Get sorted indices (use later on too) and store the sorted array
    sidx = a.argsort()
    b = a[sidx]

    # Mask of shifts/groups
    m = b[1:] != b[:-1]

    # Get indices of those shifts
    idx = np.flatnonzero(m)

    # ID array that will store the cumulative nature at the very end
    id_arr = np.ones(n,dtype=int)
    id_arr[idx[1:]+1] = -np.diff(idx)+1
    id_arr[idx[0]+1] = -idx[0]
    id_arr[0] = 0
    c = id_arr.cumsum()

    # Finally re-arrange those cumulative values back to original order
    out = np.empty(n, dtype=int)
    out[sidx] = c
    return out

样品运行 -

In [66]: a
Out[66]: array([0, 1, 0, 1, 2, 3, 0, 0, 2, 1, 1, 3, 3, 0])

In [67]: cumcount(a)
Out[67]: array([0, 0, 1, 1, 0, 0, 2, 3, 1, 2, 3, 1, 2, 4])
© www.soinside.com 2019 - 2024. All rights reserved.