cudf
和numba
。我的 *.py
文件本身并不依赖于 numba
。 在安装cudf
相关软件包之前,我的代码工作正常。我安装了
cudf
相关软件包后,python3 -m cudf.pandas my_py_101.py
导致以下错误:
[实际结果]
/usr/local/lib/python3.10/dist-packages/cudf/utils/_ptxcompiler.py:61: UserWarning: Error getting driver and runtime versions:
stdout:
stderr:
Traceback (most recent call last):
File "<string>", line 7, in <module>
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 111, in get_version
self.cudaRuntimeGetVersion(ctypes.byref(rtver))
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 65, in __getattr__
self._initialize()
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 51, in _initialize
self.lib = open_cudalib('cudart')
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/libs.py", line 63, in open_cudalib
path = get_cudalib(lib)
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/libs.py", line 55, in get_cudalib
libdir = get_cuda_paths()[dir_type].info
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cuda_paths.py", line 223, in get_cuda_paths
'nvvm': _get_nvvm_path(),
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cuda_paths.py", line 201, in _get_nvvm_path
candidates = find_lib('nvvm', path)
File "/usr/local/lib/python3.10/dist-packages/numba/misc/findlib.py", line 44, in find_lib
return find_file(regex, libdir)
File "/usr/local/lib/python3.10/dist-packages/numba/misc/findlib.py", line 56, in find_file
entries = os.listdir(ldir)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/nvvm/lib64'
Not patching Numba
warnings.warn(msg, UserWarning)
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 187, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/usr/lib/python3.10/runpy.py", line 110, in _get_module_details
__import__(pkg_name)
File "/usr/local/lib/python3.10/dist-packages/cudf/__init__.py", line 10, in <module>
validate_setup()
File "/usr/local/lib/python3.10/dist-packages/cudf/utils/gpu_utils.py", line 95, in validate_setup
cuda_runtime_version = runtimeGetVersion()
File "/usr/local/lib/python3.10/dist-packages/rmm/_cuda/gpu.py", line 88, in runtimeGetVersion
major, minor = numba.cuda.runtime.get_version()
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 111, in get_version
self.cudaRuntimeGetVersion(ctypes.byref(rtver))
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 65, in __getattr__
self._initialize()
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 51, in _initialize
self.lib = open_cudalib('cudart')
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/libs.py", line 63, in open_cudalib
path = get_cudalib(lib)
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/libs.py", line 55, in get_cudalib
libdir = get_cuda_paths()[dir_type].info
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cuda_paths.py", line 223, in get_cuda_paths
'nvvm': _get_nvvm_path(),
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cuda_paths.py", line 201, in _get_nvvm_path
candidates = find_lib('nvvm', path)
File "/usr/local/lib/python3.10/dist-packages/numba/misc/findlib.py", line 44, in find_lib
return find_file(regex, libdir)
File "/usr/local/lib/python3.10/dist-packages/numba/misc/findlib.py", line 56, in find_file
entries = os.listdir(ldir)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/nvvm/lib64'
[我做了什么]
我的docker环境
Dockerfile
构建如下:
FROM ubuntu:22.04
FROM nvidia/cuda:12.0.1-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y wget && apt-get install curl -y && apt-get install unzip && apt-get install python3-pip -y
ENV PATH=$PATH:~/.local/bin:~/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
RUN pip install --extra-index-url=https://pypi.nvidia.com cudf-cu12==23.12.* dask-cudf-cu12==23.12.* cuml-cu12==23.12.* cugraph-cu12==23.12.*
RUN pip install numpy==1.24.3 pandas==1.5.3 Cython==3.0.6 scikit-learn==1.3.2 swifter==1.3.4 requests==2.28.2 numba==0.57.1 scikit-learn-intelex==2024.0.1
RUN pip install torch torchvision torchaudio
numba
包有关。我检查了依赖页面,发现cudf
依赖于numba>=0.57,numba<0.58
,我有numba==0.57.1
。请注意,我的脚本中没有有任何numba
相关代码。cudf
时,
cuda 12.0
需要 cuda 12.0.1
,这是最接近的版本。启动docker的
yaml
文件是这样的:
apiVersion: batch/v1
kind: Job
metadata:
name: test-cuda
namespace: tom # job and pvc should be in the same namespace
spec:
template:
metadata:
labels:
app: test-cuda
spec:
containers:
- name: test-cuda
image: <my_url>/tom/valid:cudf
command: ["bash", "-c", "tail /proc/cpuinfo -n 28 &>> job.log; python3 -m cudf.pandas my_py_101.py &>> job.log; echo 'test my_py & GPU' &>> job.log; mkdir result_my_py_20231229 ; mv job.log result_my_py_20231229/ ; tar -cjf result_my_py_20231229.bz2 result_my_py_20231229/ ; ls *.bz2; pwd ; aws s3 cp --endpoint http://<my_url> /result_my_py_20231229.bz2 s3://mybucket01/"]
resources:
requests:
cpu: 9
memory: 128Gi
limits:
cpu: 12
memory: 256Gi
imagePullPolicy: IfNotPresent #Always
restartPolicy: Never
我该如何修复它?
我以前作为 cuDF 开发人员遇到过这个问题。我认为您可以通过更改 Dockerfile 中的一行来解决此问题。尝试从 CUDA 容器的“devel”风格制作 Docker 镜像:
FROM nvidia/cuda:12.0.1-devel-ubuntu22.04
当您导入
cudf
时,它会将 numba
作为依赖项导入。但是,numba
在导入时失败,因为它只找到其 CUDA Toolkit 要求的一部分。 runtime
CUDA 映像相当小,并且没有 Numba 需要的一些 NVVM 部分。
背景:cuDF 库支持用户定义函数 (UDF),以实现
df.apply
等功能。为了在 GPU 上执行用户定义的 Python 代码,cuDF 调用 Numba 来执行即时 (JIT) CUDA 编译。 Numba 需要 CUDA 工具包的某些部分来执行此操作,包括 NVVM。 nvidia/cuda
“运行时”风味映像附带的 CUDA 工具包并不包含所需的所有部分,因为 Numba 所需的 NVVM 和相关工具被视为编译器。 “运行时”风格的目标是拥有最小的 Docker 镜像大小,因此编译器被排除在外。 “devel”风格确实包含 NVVM 以及构建 CUDA 代码所需的所有其他组件。