cudf 和 numba 版本冲突

问题描述 投票:0回答:1

我安装了

cudf
numba
。我的
*.py
文件本身并不依赖于
numba
在安装cudf
相关软件包之前
,我的代码工作正常。我安装了
cudf
相关软件包后,
python3 -m cudf.pandas my_py_101.py
导致以下错误:

[实际结果]

/usr/local/lib/python3.10/dist-packages/cudf/utils/_ptxcompiler.py:61: UserWarning: Error getting driver and runtime versions:

stdout:

stderr:

Traceback (most recent call last):
  File "<string>", line 7, in <module>
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 111, in get_version
    self.cudaRuntimeGetVersion(ctypes.byref(rtver))
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 65, in __getattr__
    self._initialize()
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 51, in _initialize
    self.lib = open_cudalib('cudart')
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/libs.py", line 63, in open_cudalib
    path = get_cudalib(lib)
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/libs.py", line 55, in get_cudalib
    libdir = get_cuda_paths()[dir_type].info
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cuda_paths.py", line 223, in get_cuda_paths
    'nvvm': _get_nvvm_path(),
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cuda_paths.py", line 201, in _get_nvvm_path
    candidates = find_lib('nvvm', path)
  File "/usr/local/lib/python3.10/dist-packages/numba/misc/findlib.py", line 44, in find_lib
    return find_file(regex, libdir)
  File "/usr/local/lib/python3.10/dist-packages/numba/misc/findlib.py", line 56, in find_file
    entries = os.listdir(ldir)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/nvvm/lib64'


Not patching Numba
  warnings.warn(msg, UserWarning)
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/lib/python3.10/runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "/usr/local/lib/python3.10/dist-packages/cudf/__init__.py", line 10, in <module>
    validate_setup()
  File "/usr/local/lib/python3.10/dist-packages/cudf/utils/gpu_utils.py", line 95, in validate_setup
    cuda_runtime_version = runtimeGetVersion()
  File "/usr/local/lib/python3.10/dist-packages/rmm/_cuda/gpu.py", line 88, in runtimeGetVersion
    major, minor = numba.cuda.runtime.get_version()
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 111, in get_version
    self.cudaRuntimeGetVersion(ctypes.byref(rtver))
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 65, in __getattr__
    self._initialize()
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 51, in _initialize
    self.lib = open_cudalib('cudart')
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/libs.py", line 63, in open_cudalib
    path = get_cudalib(lib)
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/libs.py", line 55, in get_cudalib
    libdir = get_cuda_paths()[dir_type].info
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cuda_paths.py", line 223, in get_cuda_paths
    'nvvm': _get_nvvm_path(),
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cuda_paths.py", line 201, in _get_nvvm_path
    candidates = find_lib('nvvm', path)
  File "/usr/local/lib/python3.10/dist-packages/numba/misc/findlib.py", line 44, in find_lib
    return find_file(regex, libdir)
  File "/usr/local/lib/python3.10/dist-packages/numba/misc/findlib.py", line 56, in find_file
    entries = os.listdir(ldir)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/nvvm/lib64'

[我做了什么]

我的docker环境

Dockerfile
构建如下:

FROM ubuntu:22.04
FROM nvidia/cuda:12.0.1-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y wget && apt-get install curl -y && apt-get install unzip && apt-get install python3-pip -y
ENV PATH=$PATH:~/.local/bin:~/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
RUN pip install --extra-index-url=https://pypi.nvidia.com cudf-cu12==23.12.* dask-cudf-cu12==23.12.* cuml-cu12==23.12.* cugraph-cu12==23.12.*
RUN pip install numpy==1.24.3 pandas==1.5.3 Cython==3.0.6 scikit-learn==1.3.2 swifter==1.3.4 requests==2.28.2 numba==0.57.1 scikit-learn-intelex==2024.0.1
RUN pip install torch torchvision torchaudio
  1. 该错误似乎与
    numba
    包有关。我检查了依赖页面,发现
    cudf
    依赖于
    numba>=0.57,numba<0.58
    ,我有
    numba==0.57.1
    。请注意,我的脚本中没有有任何
    numba
    相关代码。
  2. 当我使用
  3. cudf
     时,
    cuda 12.0
    需要
    cuda 12.0.1
    ,这是最接近的版本

启动docker的

yaml
文件是这样的:

apiVersion: batch/v1
kind: Job
metadata:
        name: test-cuda
        namespace: tom # job and pvc should be in the same namespace
spec:
        template:
                metadata:
                        labels:
                                app: test-cuda
                spec:
                        containers:
                        - name: test-cuda
                          image: <my_url>/tom/valid:cudf
                          command: ["bash", "-c", "tail /proc/cpuinfo -n 28 &>> job.log; python3 -m cudf.pandas my_py_101.py &>> job.log; echo 'test my_py & GPU' &>> job.log; mkdir result_my_py_20231229 ; mv job.log result_my_py_20231229/ ; tar -cjf result_my_py_20231229.bz2 result_my_py_20231229/ ; ls *.bz2; pwd ; aws s3 cp --endpoint http://<my_url> /result_my_py_20231229.bz2  s3://mybucket01/"]
                          resources:
                                requests:
                                        cpu: 9
                                        memory: 128Gi
                                limits:
                                        cpu: 12
                                        memory: 256Gi
                          imagePullPolicy: IfNotPresent #Always
                        restartPolicy: Never

我该如何修复它?

python dependencies numba cudf
1个回答
0
投票

我以前作为 cuDF 开发人员遇到过这个问题。我认为您可以通过更改 Dockerfile 中的一行来解决此问题。尝试从 CUDA 容器的“devel”风格制作 Docker 镜像:

FROM nvidia/cuda:12.0.1-devel-ubuntu22.04

当您导入

cudf
时,它会将
numba
作为依赖项导入。但是,
numba
在导入时失败,因为它只找到其 CUDA Toolkit 要求的一部分。
runtime
CUDA 映像相当小,并且没有 Numba 需要的一些 NVVM 部分。

背景:cuDF 库支持用户定义函数 (UDF),以实现

df.apply
等功能。为了在 GPU 上执行用户定义的 Python 代码,cuDF 调用 Numba 来执行即时 (JIT) CUDA 编译。 Numba 需要 CUDA 工具包的某些部分来执行此操作,包括 NVVM。
nvidia/cuda
“运行时”风味映像附带的 CUDA 工具包并不包含所需的所有部分,因为 Numba 所需的 NVVM 和相关工具被视为编译器。 “运行时”风格的目标是拥有最小的 Docker 镜像大小,因此编译器被排除在外。 “devel”风格确实包含 NVVM 以及构建 CUDA 代码所需的所有其他组件。

© www.soinside.com 2019 - 2024. All rights reserved.