如何在 Docker 容器中 conda 安装启用 CUDA 的 PyTorch？

Question

我正在尝试在构建了 conda 环境的服务器上构建 Docker 容器。除了支持 CUDA 的 PyTorch 之外，所有其他要求都得到满足（但是我可以在没有 CUDA 的情况下让 PyTorch 工作，没有问题）。如何确保 PyTorch 正在使用 CUDA？

这是

Dockerfile

：

# Use nvidia/cuda image
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04

# set bash as current shell
RUN chsh -s /bin/bash

# install anaconda
RUN apt-get update
RUN apt-get install -y wget bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 git mercurial subversion && \
        apt-get clean
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O ~/anaconda.sh && \
        /bin/bash ~/anaconda.sh -b -p /opt/conda && \
        rm ~/anaconda.sh && \
        ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
        echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
        find /opt/conda/ -follow -type f -name '*.a' -delete && \
        find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
        /opt/conda/bin/conda clean -afy

# set path to conda
ENV PATH /opt/conda/bin:$PATH


# setup conda virtual environment
COPY ./requirements.yaml /tmp/requirements.yaml
RUN conda update conda \
    && conda env create --name camera-seg -f /tmp/requirements.yaml \
    && conda install -y -c conda-forge -n camera-seg flake8

# From the pythonspeed tutorial; Make RUN commands use the new environment
SHELL ["conda", "run", "-n", "camera-seg", "/bin/bash", "-c"]

# PyTorch with CUDA 10.2
RUN conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch

RUN echo "conda activate camera-seg" > ~/.bashrc
ENV PATH /opt/conda/envs/camera-seg/bin:$PATH

当我尝试构建此容器时，这给了我以下错误（

docker build -t camera-seg .

）：

.....

Step 10/12 : RUN conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
 ---> Running in e0dd3e648f7b
ERROR conda.cli.main_run:execute(34): Subprocess for 'conda run ['/bin/bash', '-c', 'conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch']' command failed.  (See above for error)

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run

    $ conda init <SHELL_NAME>

Currently supported shells are:
  - bash
  - fish
  - tcsh
  - xonsh
  - zsh
  - powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'.



The command 'conda run -n camera-seg /bin/bash -c conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch' returned a non-zero code: 1

这是

requirements.yaml

:

name: camera-seg
channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.6
  - numpy
  - pillow
  - yaml
  - pyyaml
  - matplotlib
  - jupyter
  - notebook
  - tensorboardx
  - tensorboard
  - protobuf
  - tqdm

当我将

pytorch

、

torchvision

和

cudatoolkit=10.2

放入

requirements.yaml

内时，PyTorch已成功安装，但无法识别CUDA（

torch.cuda.is_available()

返回

False

）。

我尝试了各种解决方案，例如this、this和this以及它们的一些不同组合，但都无济于事。

非常感谢任何帮助。谢谢。

Answer 1

经过多次尝试，我终于成功了。将答案发布在这里，以防对任何人有帮助。

基本上，我通过

pytorch

（在

torchvision

环境中）安装了

pip

和

conda

，并像往常一样通过

conda

安装了其余依赖项。

这就是最终的

Dockerfile

的样子：

# Use nvidia/cuda image
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04

# set bash as current shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]

# install anaconda
RUN apt-get update
RUN apt-get install -y wget bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 git mercurial subversion && \
        apt-get clean
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O ~/anaconda.sh && \
        /bin/bash ~/anaconda.sh -b -p /opt/conda && \
        rm ~/anaconda.sh && \
        ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
        echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
        find /opt/conda/ -follow -type f -name '*.a' -delete && \
        find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
        /opt/conda/bin/conda clean -afy

# set path to conda
ENV PATH /opt/conda/bin:$PATH


# setup conda virtual environment
COPY ./requirements.yaml /tmp/requirements.yaml
RUN conda update conda \
    && conda env create --name camera-seg -f /tmp/requirements.yaml

RUN echo "conda activate camera-seg" >> ~/.bashrc
ENV PATH /opt/conda/envs/camera-seg/bin:$PATH
ENV CONDA_DEFAULT_ENV $camera-seg

这就是

requirements.yaml

的样子：

name: camera-seg
channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.6
  - pip
  - numpy
  - pillow
  - yaml
  - pyyaml
  - matplotlib
  - jupyter
  - notebook
  - tensorboardx
  - tensorboard
  - protobuf
  - tqdm
  - pip:
    - torch
    - torchvision

然后我使用命令

docker build -t camera-seg .

构建容器，PyTorch 现在能够识别 CUDA。

Answer 2

我设法使用以下 Dockerfile 设置它：

FROM nvidia/cuda:11.3.1-devel-ubuntu20.04
ENV TZ=Europe/Brussels

RUN apt-get update --fix-missing && DEBIAN_FRONTEND=noninteractive apt-get install --assume-yes --no-install-recommends \
   build-essential \
   python3 \
   python3-dev \
   python3-pip

RUN pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

我确保 cuda 版本与运行 docker 容器的计算机上安装的版本相同。

然后我进行了 docker 构建并运行如下：

$ docker build . -t docker-example:latest
$ docker run --gpus all --interactive --tty docker-example:latest

在 docker 容器内，在 python shell 内，

torch.cuda.is_available()

将返回

True

。

Answer 3

我创建的图像有 conda + pytorch + gpu 设置 + 代码服务器：https://hub.docker.com/r/mhadhbixissam/ubuntu-conda-pytorch

如何在 Docker 容器中 conda 安装启用 CUDA 的 PyTorch？

问题描述投票：0回答：3

3个回答

最新问题

如何在 Docker 容器中 conda 安装启用 CUDA 的 PyTorch？

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3