使用任何编程语言在稀疏数组上使用 GPU 进行快速张量点？

Question

我现在正在研究两个多维数组

arr

和

cost

。

arr

的大小为

(width, height, m, n)

密集，而

cost

的大小为

(width, height, width, height)

稀疏。

值：

width

和

height

在

附近，

小于

，

小于

。稀疏

cost

是严格块稀疏的。对于

cost[i,j,:,:]

，只有某个

k*k

块，其中

小于

。

现在我想要这个：

result = np.tensordot(cost, arr, axes=[[2,3],[0,1]]

result

的大小就是

(width, height, m, n)

（与

arr

相同）。然而，如果定义为密集数组，

cost

数组就太大了。

所以，问题是：如何在稀疏数组上进行快速张量点（使用 GPU 效果更好）？

可能的解决方案是任何编程语言都可以的想法，包括但不限于Python、C++、Julia等。

我已经尝试过：

CPU 版本（多线程）：C++（仅使用

std::vector

）、

numpy

/

scipy

、Julia

GPU 版本：

cupy

、

CUDA.jl

C++ 版本可以很好地处理

std::vector

上的一些简单 for 循环，但速度不够快。

使用

cupy

和

CUDA.jl

，它们在将

width

和

height

设置为小值并定义

cost

和

arr

都是密集的小型测试中都工作得非常快。但我不知道如何修改为稀疏版本。

Answer 1

您的张量点可以使用其他 numpy 工具进行评估，并表示为 2d 数组的

dot

。

出于说明目的，我不会尝试增大尺寸或使任何内容变得稀疏。无论如何，

cost

的稀疏性并不是那么清楚。

尺寸小，具有可重复的值：

In [852]: width, height, m, n = 10,10,5,4    
In [853]: arr = np.arange(width*height*m*n).reshape(width,height,m,n)    
In [854]: cost = np.arange(10**4).reshape(10,10,10,10)

你的张量

In [856]: res = np.tensordot(cost,arr,((2,3),(0,1)))    
In [857]: res.shape
Out[857]: (10, 10, 5, 4)

相当于使用

einsum

:

In [859]: res1=np.einsum('ijkl,klmn->ijmn',cost,arr)    
In [860]: res1.shape
Out[860]: (10, 10, 5, 4)

In [861]: np.allclose(res,res1)
Out[861]: True

将

matmul/dot

与二维数组一起使用：

In [863]: res2=cost.reshape(10*10,10*10)@arr.reshape(10*10,-1)
In [864]: res2.shape
Out[864]: (100, 20)

In [865]: np.allclose(res, res2.reshape(res.shape))
Out[865]: True

tensordot

正在做同样的重塑。

使用任何编程语言在稀疏数组上使用 GPU 进行快速张量点？

问题描述投票：0回答：1

1个回答

最新问题

使用任何编程语言在稀疏数组上使用 GPU 进行快速张量点？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1