对火炬稀疏张量进行列/行切片

问题描述 投票:0回答:3

我有一个pytorch稀疏张量,我需要使用这个切片

[idx][:,idx]
进行行/列切片,其中
idx
是索引列表,使用提到的切片在普通浮点张量上产生我想要的结果。是否可以在稀疏张量上应用相同的切片?示例如下:

#constructing sparse matrix
i = np.array([[0,1,2,2],[0,1,2,1]])
v = np.ones(4)
i = torch.from_numpy(i.astype("int64"))
v = torch.from_numpy(v.astype("float32"))
test1 = torch.sparse.FloatTensor(i, v)

#constructing float tensor
test2 = np.array([[1,0,0],[0,1,0],[0,1,1]])
test2 = autograd.Variable(torch.cuda.FloatTensor(test2), requires_grad=False)

#slicing
idx = [1,2]
print(test2[idx][:,idx])

输出:

Variable containing:
 1  0
 1  1
[torch.cuda.FloatTensor of size 2x2 (GPU 0)]

我持有一个 250.000 x 250.000 的邻接矩阵,我需要使用随机 idx 对

n
行和
n
列进行切片,只需对
n
随机 idx 进行采样。由于数据集太大,转换为更方便的数据类型是不现实的。

我可以在 test1 上获得相同的切片结果吗?有可能吗?如果不行的话有什么解决办法吗?

现在我正在使用以下解决方案“黑客”运行我的模型:

idx = sorted(random.sample(range(0, np.shape(test1)[0]), 9000))
test1 = test1AsCsr[idx][:,idx].todense().astype("int32")
test1 = autograd.Variable(torch.cuda.FloatTensor(test1), requires_grad=False)

其中 test1AsCsr 是我的 test1 转换为 numpy CSR 矩阵。这个解决方案有效,但是速度非常慢,并且使我的 GPU 利用率非常低,因为它需要不断地从 CPU 内存中读取/写入。

编辑:结果是非稀疏张量就可以了

python slice sparse-matrix pytorch
3个回答
5
投票

这个问题已经有好几年了,但迟到总比不到好。

这是我用来切片稀疏张量的函数。 (辅助功能如下)

def slice_torch_sparse_coo_tensor(t, slices):
    """
    params:
    -------
    t: tensor to slice
    slices: slice for each dimension

    returns:
    --------
    t[slices[0], slices[1], ..., slices[n]]
    """

    t = t.coalesce()
    assert len(args) == len(t.size())
    for i in range(len(args)):
        if type(args[i]) is not torch.Tensor:
            args[i] = torch.tensor(args[i], dtype= torch.long)

    indices = t.indices()
    values = t.values()
    for dim, slice in enumerate(args):
        invert = False
        if t.size(0) * 0.6 < len(slice):
            invert = True
            all_nodes = torch.arange(t.size(0))
            unique, counts = torch.cat([all_nodes, slice]).unique(return_counts=True)
            slice = unique[counts==1]
        if slice.size(0) > 400:
            mask = ainb_wrapper(indices[dim], slice)
        else:
            mask = ainb(indices[dim], slice)
        if invert:
            mask = ~mask
        indices = indices[:, mask]
        values = values[mask]

        
    return torch.sparse_coo_tensor(indices, values, t.size()).coalesce()

使用情况(在我的机器上花了 2.4 秒):

indices = torch.randint(low= 0, high= 200000, size= (2, 1000000))
values = torch.rand(size=(1000000,))
t = torch.sparse_coo_tensor(indices, values, size=(200000, 200000))
idx = torch.arange(1000)
slice_coo(t, [idx, idx])

输出:

tensor(indices=tensor([[ 13,  62,  66,  78, 134, 226, 233, 266, 299, 344, 349,
                        349, 369, 396, 421, 531, 614, 619, 658, 687, 769, 792,
                        810, 840, 926, 979],
                       [255, 479, 305, 687, 672, 867, 444, 559, 772,  96, 788,
                        980, 423, 699, 911, 156, 267, 721, 381, 781,  97, 271,
                        840, 292, 487, 185]]),
       values=tensor([0.4260, 0.4816, 0.8001, 0.8815, 0.3971, 0.4914, 0.7068,
                      0.2329, 0.4038, 0.1757, 0.7758, 0.3210, 0.2593, 0.8290,
                      0.1320, 0.4322, 0.7529, 0.8341, 0.8128, 0.4457, 0.4100,
                      0.1618, 0.4097, 0.3088, 0.6942, 0.5620]),
       size=(200000, 200000), nnz=26, layout=torch.sparse_coo)

slice_torch_sparse_coo_tensor 的时序:

%timeit slice_torch_sparse_coo_tensor(t, torch.randperm(200000)[:500], torch.arange(200000))

output:
    1.08 s ± 447 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

对于内置 torch.index_select (在here实现):

%timeit t.index_select(0, torch.arange(100))

output:
    56.7 s ± 4.87 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

这些是我用于此目的的辅助函数,函数“ainb”查找 a 中 b 中的元素。我不久前在互联网上发现了这个功能,但我找不到链接它的帖子。

import torch
def ainb(a,b):
    """gets mask for elements of a in b"""

    size = (b.size(0), a.size(0))

    if size[0] == 0: # Prevents error in torch.Tensor.max(dim=0)
        return torch.tensor([False]*a.size(0), dtype= torch.bool)
        
    a = a.expand((size[0], size[1]))
    b = b.expand((size[1], size[0])).T

    mask = a.eq(b).max(dim= 0).values

    return mask

def ainb_wrapper(a, b, splits = .72):
    inds = int(len(a)**splits)

    tmp = [ainb(a[i*inds:(i+1)*inds], b) for i in list(range(inds))]

    return torch.cat(tmp)

由于函数随元素数量呈二次方缩放,因此我添加了一个包装器,将输入分割成块,然后连接输出。仅使用 CPU 效率更高,但我不确定使用 GPU 时这是否成立,如果有人可以测试它,我将不胜感激:)

这是我第一次发帖,因此也感谢对帖子质量的反馈。


2
投票

二维稀疏索引的可能答案

在下面找到答案,使用几种 pytorch 方法(

torch.eq()
torch.unique()
torch.sort()
等),以输出形状为
(len(idx), len(idx))
的紧凑切片张量。

我测试了几种边缘情况(无序

idx
v
0
i
与多个相同索引对等),尽管我可能忘记了一些。还应该检查性能。

import torch
import numpy as np

def in1D(x, labels):
    """
    Sub-optimal equivalent to numpy.in1D().
    Hopefully this feature will be properly covered soon
    c.f. https://github.com/pytorch/pytorch/issues/3025
    Snippet by Aron Barreira Bordin
    Args:
        x (Tensor):             Tensor to search values in
        labels (Tensor/list):   1D array of values to search for

    Returns:
        Tensor: Boolean tensor y of same shape as x, with y[ind] = True if x[ind] in labels

    Example:
        >>> in1D(torch.FloatTensor([1, 2, 0, 3]), [2, 3])
        FloatTensor([False, True, False, True])
    """
    mapping = torch.zeros(x.size()).byte()
    for label in labels:
        mapping = mapping | x.eq(label)
    return mapping


def compact1D(x):
    """
    "Compact" values 1D uint tensor, so that all values are in [0, max(unique(x))].
    Args:
        x (Tensor): uint Tensor

    Returns:
        Tensor: uint Tensor of same shape as x

    Example:
        >>> densify1D(torch.ByteTensor([5, 8, 7, 3, 8, 42]))
        ByteTensor([1, 3, 2, 0, 3, 4])
    """
    x_sorted, x_sorted_ind = torch.sort(x, descending=True)
    x_sorted_unique, x_sorted_unique_ind = torch.unique(x_sorted, return_inverse=True)
    x[x_sorted_ind] = x_sorted_unique_ind
    return x

# Input sparse tensor:
i = torch.from_numpy(np.array([[0,1,4,3,2,1],[0,1,3,1,4,1]]).astype("int64"))
v = torch.from_numpy(np.arange(1, 7).astype("float32"))
test1 = torch.sparse.FloatTensor(i, v)
print(test1.to_dense())
# tensor([[ 1.,  0.,  0.,  0.,  0.],
#         [ 0.,  8.,  0.,  0.,  0.],
#         [ 0.,  0.,  0.,  0.,  5.],
#         [ 0.,  4.,  0.,  0.,  0.],
#         [ 0.,  0.,  0.,  3.,  0.]])

# note: test1[1, 1] = v[i[1,:]] + v[i[6,:]] = 2 + 6 = 8
#       since both i[1,:] and i[6,:] are [1,1]

# Input slicing indices:
idx = [4,1,3]

# Getting the elements in `i` which correspond to `idx`:
v_idx = in1D(i, idx).byte()
v_idx = v_idx.sum(dim=0).squeeze() == i.size(0) # or `v_idx.all(dim=1)` for pytorch 0.5+
v_idx = v_idx.nonzero().squeeze()

# Slicing `v` and `i` accordingly:
v_sliced = v[v_idx]
i_sliced = i.index_select(dim=1, index=v_idx)

# Building sparse result tensor:
i_sliced[0] = compact1D(i_sliced[0])
i_sliced[1] = compact1D(i_sliced[1])

# To make sure to have a square dense representation:
size_sliced = torch.Size([len(idx), len(idx)])
res = torch.sparse.FloatTensor(i_sliced, v_sliced, size_sliced)

print(res)
# torch.sparse.FloatTensor of size (3,3) with indices:
# tensor([[ 0,  2,  1,  0],
#         [ 0,  1,  0,  0]])
# and values:
# tensor([ 2.,  3.,  4.,  6.])

print(res.to_dense())
# tensor([[ 8.,  0.,  0.],
#         [ 4.,  0.,  0.],
#         [ 0.,  3.,  0.]])

之前一维稀疏索引的答案

这是一个(可能是次优且未涵盖所有边缘情况)解决方案,遵循相关的开放问题中共享的直觉(希望此功能很快就会得到正确涵盖):

# Constructing a sparse tensor a bit more complicated for the sake of demo:
i = torch.LongTensor([[0, 1, 5, 2]])
v = torch.FloatTensor([[1, 3, 0], [5, 7, 0], [9, 9, 9], [1,2,3]])
test1 = torch.sparse.FloatTensor(i, v)

# note: if you directly have sparse `test1`, you can get `i` and `v`:
# i, v = test1._indices(), test1._values()

# Getting the slicing indices:
idx = [1,2]

# Preparing to slice `v` according to `idx`.
# For that, we gather the list of indices `v_idx` such that i[v_idx[k]] == idx[k]:
i_squeeze = i.squeeze()
v_idx = [(i_squeeze == j).nonzero() for j in idx] # <- doesn't seem optimal...
v_idx = torch.cat(v_idx, dim=1)

# Slicing `v` accordingly:
v_sliced = v[v_idx.squeeze()][:,idx]

# Now defining your resulting sparse tensor.
# I'm not sure what kind of indexing you want, so here are 2 possibilities:
# 1) "Dense" indixing:
test1x = torch.sparse.FloatTensor(torch.arange(v_idx.size(1)).long().unsqueeze(0), v_sliced)
print(test1x)
# torch.sparse.FloatTensor of size (3,2) with indices:
#
#  0  1
# [torch.LongTensor of size (1,2)]
# and values:
#
#  7  0
#  2  3
# [torch.FloatTensor of size (2,2)]

# 2) "Sparse" indixing using the original `idx`:
test1x = torch.sparse.FloatTensor(autograd.Variable(torch.LongTensor(idx)).unsqueeze(0), v_sliced)
# note: this indexing would fail if elements of `idx` were not in `i`.
print(test1x)
# torch.sparse.FloatTensor of size (3,2) with indices:
#
#  1  2
# [torch.LongTensor of size (1,2)]
# and values:
#
#  7  0
#  2  3
# [torch.FloatTensor of size (2,2)]

0
投票

我对 Prezt 的最佳答案做了一些调整,因为在某些情况下它对我不起作用。

def ainb(a,b):
    """gets mask for elements of a in b"""
    indices = torch.zeros_like(a, dtype = torch.uint8)
    for elem in b:
        indices = indices | (a == elem)

    return indices.type(torch.bool)
def slice_torch_sparse_coo_tensor(t, slices):
    """
    params:
    -------
    t: tensor to slice
    slices: slice for each dimension

    returns:
    --------
    t[slices[0], slices[1], ..., slices[n]]
    """
    new_shape = []
    new_slices  = []
    for s in slices:
        if s == ":":
            new_shape.append(t.shape[0])
            new_slices.append(torch.tensor(range(t.shape[0])))
        elif isinstance(s, int):
            sl = torch.tensor([s])
            new_slices.append(sl)
            new_shape.append(sl.shape[0])
        elif isinstance( s, list):
            sl = torch.tensor(s)
            new_slices.append(sl)
            new_shape.append(sl.shape[0])
        else:
            raise NotImplementedError(f"Slicing with{s} is not supported")

    t = t.coalesce()
    assert len(slices) == len(t.size())
    for i, slice in enumerate(new_slices):
            if len(new_slices[i].shape) >1:
                slices[i] = torch.squeeze(slices[i])

    indices = t.indices()
    values = t.values()
    for dim, slice in enumerate(new_slices):
        mask = ainb(indices[dim], slice)
        indices = indices[:, mask]
        values = values[mask]

    new_indices = [t[None,:] for t in torch.where(torch.zeros(new_shape) == 0)]
    new_indices = torch.concat(new_indices, dim=0)
    return torch.sparse_coo_tensor(new_indices, values, new_shape).coalesce()
© www.soinside.com 2019 - 2024. All rights reserved.