我们有两个数组
arr1
(有字符串元素)和arr2
(有整数)。
我想从arr2[i]
中剪辑第一个arr[i]
字符。这些数组非常大,所以我想在 Numba
cuda 中实现它。 Pythonic实现如下:
arr1 = ['abc', 'def', 'xyz']
arr2 = [1,2,3]
def python_clipper(arr1,arr2):
for i in range(len(arr1)):
arr1[i] = arr1[i][arr2[i]:]
return arr1
print(python_clipper(arr1,arr2)) # ['bc', 'f', '']
上面的实现工作正常。但是当我像这样从这个 python 函数中创建一个
cuda
函数时:
@cuda.jit()
def cuda_clipper(arr1,arr2):
i = cuda.grid(1)
arr1[i] = arr1[i][arr2[i]:]
blockspergrid, threadsperblock = len(arr1),1
cuda_clipper[blockspergrid, threadsperblock](arr1,arr2) # ['bc', 'f', '']
print(arr1)
我收到以下错误:
numba.core.errors.TypingError: Failed in cuda mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function _empty_string at 0x7f0456884d30>) found for signature:
>>> _empty_string(int64, int64, bool)
There are 2 candidate implementations:
- Of which 2 did not match due to:
Overload in function 'register_jitable.<locals>.wrap.<locals>.ov_wrap': File: numba/core/extending.py: Line 159.
With argument(s): '(int64, int64, bool)':
Rejected as the implementation raised a specific error:
NumbaRuntimeError: Failed in nopython mode pipeline (step: native lowering)
NRT required but not enabled
During: lowering "s = call $10load_global.3(kind, char_width, length, is_ascii, func=$10load_global.3, args=[Var(kind, unicode.py:276), Var(char_width, unicode.py:276), Var(length, unicode.py:276), Var(is_ascii, unicode.py:276)], kws=(), vararg=None, varkwarg=None, target=None)" at /mnt/local-raid10/workspace/user/anaconda3/envs/condaenv/lib/python3.9/site-packages/numba/cpython/unicode.py (277)
raised from /mnt/local-raid10/workspace/user/anaconda3/envs/condaenv/lib/python3.9/site-packages/numba/core/runtime/context.py:19
During: resolving callee type: Function(<function _empty_string at 0x7f0456884d30>)
During: typing of call at /mnt/local-raid10/workspace/user/anaconda3/envs/condaenv/lib/python3.9/site-packages/numba/cpython/unicode.py (1700)
File "../../anaconda3/envs/condaenv/lib/python3.9/site-packages/numba/cpython/unicode.py", line 1700:
def getitem_slice(s, idx):
<source elided>
# It's heterogeneous in kind OR stride != 1
ret = _empty_string(kind, span, is_ascii)
^
During: typing of intrinsic-call at /mnt/local-raid10/workspace/user/trim/trim_new_implementation/string_numba.py (143)
File "string_numba.py", line 143:
def cuda_clipper(arr1,arr2):
<source elided>
i = cuda.grid(1)
arr1[i] = arr1[i][arr2[i]:]
^
我的印象是切片字符串是问题所在,因为类似的实现可以很好地处理数组。我试图将
arr1
变成数组的数组,但是预处理本身需要一些时间来渲染 cuda
无用以提高性能。我如何才能在str
内直接与numba
合作,而不是想着规避问题。