如何在 Docker 容器中 conda 安装启用 CUDA 的 PyTorch?

问题描述 投票:0回答:3

我正在尝试在构建了 conda 环境的服务器上构建 Docker 容器。除了支持 CUDA 的 PyTorch 之外,所有其他要求都得到满足(但是我可以在没有 CUDA 的情况下让 PyTorch 工作,没有问题)。如何确保 PyTorch 正在使用 CUDA?

这是

Dockerfile

# Use nvidia/cuda image
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04

# set bash as current shell
RUN chsh -s /bin/bash

# install anaconda
RUN apt-get update
RUN apt-get install -y wget bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 git mercurial subversion && \
        apt-get clean
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O ~/anaconda.sh && \
        /bin/bash ~/anaconda.sh -b -p /opt/conda && \
        rm ~/anaconda.sh && \
        ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
        echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
        find /opt/conda/ -follow -type f -name '*.a' -delete && \
        find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
        /opt/conda/bin/conda clean -afy

# set path to conda
ENV PATH /opt/conda/bin:$PATH


# setup conda virtual environment
COPY ./requirements.yaml /tmp/requirements.yaml
RUN conda update conda \
    && conda env create --name camera-seg -f /tmp/requirements.yaml \
    && conda install -y -c conda-forge -n camera-seg flake8

# From the pythonspeed tutorial; Make RUN commands use the new environment
SHELL ["conda", "run", "-n", "camera-seg", "/bin/bash", "-c"]

# PyTorch with CUDA 10.2
RUN conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch

RUN echo "conda activate camera-seg" > ~/.bashrc
ENV PATH /opt/conda/envs/camera-seg/bin:$PATH

当我尝试构建此容器时,这给了我以下错误(

docker build -t camera-seg .
):

.....

Step 10/12 : RUN conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
 ---> Running in e0dd3e648f7b
ERROR conda.cli.main_run:execute(34): Subprocess for 'conda run ['/bin/bash', '-c', 'conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch']' command failed.  (See above for error)

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run

    $ conda init <SHELL_NAME>

Currently supported shells are:
  - bash
  - fish
  - tcsh
  - xonsh
  - zsh
  - powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'.



The command 'conda run -n camera-seg /bin/bash -c conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch' returned a non-zero code: 1

这是

requirements.yaml
:

name: camera-seg
channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.6
  - numpy
  - pillow
  - yaml
  - pyyaml
  - matplotlib
  - jupyter
  - notebook
  - tensorboardx
  - tensorboard
  - protobuf
  - tqdm

当我将

pytorch
torchvision
cudatoolkit=10.2
放入
requirements.yaml
内时,PyTorch已成功安装,但无法识别CUDA(
torch.cuda.is_available()
返回
False
)。

我尝试了各种解决方案,例如thisthisthis以及它们的一些不同组合,但都无济于事。

非常感谢任何帮助。谢谢。

python-3.x docker anaconda pytorch
3个回答
19
投票

经过多次尝试,我终于成功了。将答案发布在这里,以防对任何人有帮助。

基本上,我通过

pytorch
(在
torchvision
环境中)安装了
pip
conda
,并像往常一样通过
conda
安装了其余依赖项。

这就是最终的

Dockerfile
的样子:

# Use nvidia/cuda image
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04

# set bash as current shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]

# install anaconda
RUN apt-get update
RUN apt-get install -y wget bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 git mercurial subversion && \
        apt-get clean
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O ~/anaconda.sh && \
        /bin/bash ~/anaconda.sh -b -p /opt/conda && \
        rm ~/anaconda.sh && \
        ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
        echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
        find /opt/conda/ -follow -type f -name '*.a' -delete && \
        find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
        /opt/conda/bin/conda clean -afy

# set path to conda
ENV PATH /opt/conda/bin:$PATH


# setup conda virtual environment
COPY ./requirements.yaml /tmp/requirements.yaml
RUN conda update conda \
    && conda env create --name camera-seg -f /tmp/requirements.yaml

RUN echo "conda activate camera-seg" >> ~/.bashrc
ENV PATH /opt/conda/envs/camera-seg/bin:$PATH
ENV CONDA_DEFAULT_ENV $camera-seg

这就是

requirements.yaml
的样子:

name: camera-seg
channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.6
  - pip
  - numpy
  - pillow
  - yaml
  - pyyaml
  - matplotlib
  - jupyter
  - notebook
  - tensorboardx
  - tensorboard
  - protobuf
  - tqdm
  - pip:
    - torch
    - torchvision

然后我使用命令

docker build -t camera-seg .
构建容器,PyTorch 现在能够识别 CUDA。


5
投票

我设法使用以下 Dockerfile 设置它:

FROM nvidia/cuda:11.3.1-devel-ubuntu20.04
ENV TZ=Europe/Brussels

RUN apt-get update --fix-missing && DEBIAN_FRONTEND=noninteractive apt-get install --assume-yes --no-install-recommends \
   build-essential \
   python3 \
   python3-dev \
   python3-pip

RUN pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

我确保 cuda 版本与运行 docker 容器的计算机上安装的版本相同。

然后我进行了 docker 构建并运行如下:

$ docker build . -t docker-example:latest
$ docker run --gpus all --interactive --tty docker-example:latest

在 docker 容器内,在 python shell 内,

torch.cuda.is_available()
将返回
True


0
投票

我创建的图像有 conda + pytorch + gpu 设置 + 代码服务器:https://hub.docker.com/r/mhadhbixissam/ubuntu-conda-pytorch

© www.soinside.com 2019 - 2024. All rights reserved.