如何使用 Python 中的 NumPy 在 GPU 上进行数学运算？

Question

瓦斯苏普！我正在用 Python 构建一个渲染引擎，这是一个缓慢但有趣的过程。我一直在使用 NumPy，在上一个问题中，我得到了一些帮助来加速 NumPy 数学运算以求解透视中的顶点位置，但它仍然在 CPU 上运行。当我集成更多功能（例如纹理映射）时，我知道需要解决的数学问题将呈指数级增长，因此我需要弄清楚如何在 GPU 上并行执行此操作。

我没有这方面的经验。我尝试过使用 Numba 中的 GPU 函数（如

@cuda.jit

和

@vectorize

）以及一些 CuPy，但它们要么根本不适用于我的代码，要么运行速度比 CPU 等效函数慢。对于初学者来说，文档都感觉很神秘，所以我很挣扎。

哪种方法可能最适合我的情况，我错过了什么？我想在 GPU 上解决大量数学问题，我对如何实现这一点很灵活。是否有特定的特征使数学问题对 GPU 应用程序有利或不利？

我的 GPU 是 NVIDIA GeForce RTX 3050 Ti 笔记本电脑 GPU，我正在运行 Python 3.11，并已安装最新版本的 NumPy、Numba 和 CuPy，但我愿意接受新选项。我还通过Anaconda安装了Cuda工具包，虽然我不知道如何使用它。

虽然这主要是关于如何使用 GPU 的一般性问题，但以下是我当前尝试在 GPU 上运行的代码。我希望得到有关使用 GPU 的答案，这些答案不限于以下代码： import time import numpy as np from numba import njit #Unused packages: vectorize, cuda @njit def render_all_verts_NJIT(vert_array, unit_vec, shift, focus): data = (vert_array - shift).T data = np.dot(unit_vec, data) data[:2] *= focus / np.abs(data[2:3]) return data.T

这里有一个不带

@njit

的等效函数，以防会导致 GPU 出现问题（

@njit

在第一次调用时需要额外 1.2 秒的时间进行编译，但此后，此方法在我的 CPU 上大约慢了 60%）：

def render_all_verts(vert_array, unit_vec, shift, focus):
    data = (vert_array - shift).T
    data = np.dot(unit_vec, data)
    data[:2] *= focus / np.abs(data[2:3])
    return data.T

这只是我用来测试和计时功能的代码：

# TESTING AND TIMING THE FUNCTIONS --------------------------------------- # The next few lines make an array, similar to the 3D points I'd be rendering. # It contains n vertices with random float coordinate values from -m to m n = 1000 m = 50 original_vertices = (np.random.sample(size=(n, 3)).astype(np.float32)-0.5)*2 * m print('Original vertices:\n', original_vertices, '\n') # The next few variables are camera attributes that will be used to render vertices camera_vector = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.float32) camera_shift = np.array([0, 0, 10], dtype=np.float32) camera_focus = np.single(5) # This empty array is the same shape as example_vertices. The output results will be saved here. rendered_vertices = np.empty(original_vertices.shape) #This repeatedly renders the given points using the given function and times it. def render_example(function, example_array, loop_times): start_time = time.time() for i in range(loop_times): output = function(example_array, camera_vector, camera_shift, camera_focus) print(f'Time for function {str(function)} rendering test array of shape {np.shape(example_array)} {loop_times} times...') print(f'--- {time.time() - start_time} seconds ---') return output render_times = 10000 rendered_vertices = render_example(render_all_verts_NJIT, original_vertices, render_times) rendered_vertices = render_example(render_all_verts, original_vertices, render_times) print('\nLast calculated render of vertices:\n', rendered_vertices)

Answer 1

您可以像这样使用带有 device='cuda' 的 pytorch。可能可以根据精度检查不同类型的性能（torch.float16或torch.bfloat16而不是torch.float32）

camera_vector = torch.tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=torch.float32, device='cuda') camera_shift = torch.tensor([0, 0, 10], dtype=torch.float32, device='cuda') camera_focus = 5 torch_vertices = torch.tensor(example_array, device='cuda') def render_all_verts_torch(vert_tensor, camera_vector, camera_shift, camera_focus): data = vert_tensor.t().to(torch.float32) data += camera_shift.view(3, 1) torch.mm(camera_vector, data, out=data) data[:2] *= camera_focus / torch.abs(data[2:3]) return data.t()

您还可以摆脱 GPU 上的转置。

def render_all_verts_torch(vert_tensor, camera_vector, camera_shift, camera_focus): data = vert_tensor.to(torch.float32) data += camera_shift torch.mm(data, camera_vector, out=data) data[:, :2] *= camera_focus / torch.abs(data[:, 2:3]) return data

如何使用 Python 中的 NumPy 在 GPU 上进行数学运算？

问题描述投票：0回答：1

1个回答

最新问题

如何使用 Python 中的 NumPy 在 GPU 上进行数学运算？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1