无法在 Google Colab 中安装 dgl-cu<any version>

问题描述 投票:0回答:2

我尝试在 Google Colab 中使用 dgl 运行图形模型,但在训练模型时继续出现错误。我相信我的主要问题是我无法使用 加载 dgl-cuda 库

!pip install dgl-cu111

我收到以下错误:

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
ERROR: Could not find a version that satisfies the requirement dgl-cu111 (from versions: none)
ERROR: No matching distribution found for dgl-cu111

训练模型时,出现以下错误:

load_done
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/rnn.py:71: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1
  warnings.warn("dropout option adds dropout after all but last "
init done
Epoch 1:   0%|          | 0/49 [00:00<?, ?it/s]
---------------------------------------------------------------------------
DGLError                                  Traceback (most recent call last)
<ipython-input-7-783797e86ab0> in <cell line: 209>()
    207 
    208 
--> 209 train(model)
    210 test_func(model, y_test, X_test)

3 frames
<ipython-input-7-783797e86ab0> in train(net)
    179                         gc.collect()
    180                         continue
--> 181                     acc, loss, _ = fwd_pass(batch_X, batch_y, train=True)
    182 
    183                     losses.append(loss.item())

<ipython-input-7-783797e86ab0> in fwd_pass(X, y, train)
    108     for item in X:
    109         x = [0, 0]
--> 110         x[0] = item[0].to(device)
    111         x[1] = item[1].to(device)
    112         out.append(model(x))

/usr/local/lib/python3.10/dist-packages/dgl/heterograph.py in to(self, device, **kwargs)
   5707 
   5708         # 1. Copy graph structure
-> 5709         ret._graph = self._graph.copy_to(utils.to_dgl_context(device))
   5710 
   5711         # 2. Copy features

/usr/local/lib/python3.10/dist-packages/dgl/heterograph_index.py in copy_to(self, ctx)
    253             The graph index on the given device context.
    254         """
--> 255         return _CAPI_DGLHeteroCopyTo(self, ctx.device_type, ctx.device_id)
    256 
    257     def pin_memory(self):

dgl/_ffi/_cython/./function.pxi in dgl._ffi._cy3.core.FunctionBase.__call__()

dgl/_ffi/_cython/./function.pxi in dgl._ffi._cy3.core.FuncCall()

dgl/_ffi/_cython/./function.pxi in dgl._ffi._cy3.core.FuncCall3()

DGLError: [01:00:58] /opt/dgl/src/runtime/c_runtime_api.cc:82: Check failed: allow_missing: Device API cuda is not enabled. Please install the cuda version of dgl.
Stack trace:
  [bt] (0) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x75) [0x7fae2b978e55]
  [bt] (1) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dgl::runtime::DeviceAPIManager::GetAPI(std::string, bool)+0x1f2) [0x7fae2bcf85f2]
  [bt] (2) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dgl::runtime::DeviceAPI::Get(DGLContext, bool)+0x1e1) [0x7fae2bcf2ba1]
  [bt] (3) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dgl::runtime::NDArray::Empty(std::vector<long, std::allocator<long> >, DGLDataType, DGLContext)+0x13b) [0x7fae2bd15acb]
  [bt] (4) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dgl::runtime::NDArray::CopyTo(DGLContext const&) const+0xc3) [0x7fae2bd4fe23]
  [bt] (5) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dgl::UnitGraph::CopyTo(std::shared_ptr<dgl::BaseHeteroGraph>, DGLContext const&)+0x3ef) [0x7fae2be5d79f]
  [bt] (6) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dgl::HeteroGraph::CopyTo(std::shared_ptr<dgl::BaseHeteroGraph>, DGLContext const&)+0xf6) [0x7fae2bd61286]
  [bt] (7) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(+0x52cbb6) [0x7fae2bd70bb6]
  [bt] (8) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(DGLFuncCall+0x48) [0x7fae2bcf7bb8]

关于如何在 Google Colab 上安装 dgl-gpu 库有什么想法吗?我正在使用 Colab 的 A100 GPU:

(nvcc:NVIDIA (R) Cuda 编译器驱动程序 版权所有 (c) 2005-2022 NVIDIA 公司 建于 Wed_Sep_21_10:33:58_PDT_2022 Cuda 编译工具,版本 11.8,V11.8.89 构建 cuda_11.8.r11.8/compiler.31833905_0)

python pytorch google-colaboratory dgl
2个回答
2
投票

我在使用 V100 GPU 时遇到过这个问题。我的解决方法是指定来源:

pip install dgl==1.0.1+cu117 -f https://data.dgl.ai/wheels/cu117/repo.html

确保为您的设置选择正确的 CUDA 版本。


0
投票

这个答案适用于任何试图匹配

pytorch
dgl
版本的人

经过多次来回尝试匹配

python
pytorch
cuda
版本 [1],以下步骤对我有用。 (从新环境开始更容易,因为包可能会发生很多冲突)

[1] - https://www.dgl.ai/pages/start.html

## Create new environment, use arbitrary name "myenv" that you prefer
conda create -n myenv python=3.11

## Activate environment
source activate myenv

## Install pytorch 2.2 
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia

## Install dgl which matches pytorch 2.2 and cuda 12.1 
conda install -c dglteam/label/cu121 dgl

## Add environment to jupyter kernel
conda install -c anaconda ipykernel -y
python -m ipykernel install --user --name=myenv

# install remaining things that dgl needs
pip install torchdata
pip install pandas
pip install pyyaml
pip install pydantic

示例代码

import torch.nn.functional as F
import dgl
from dgl.nn import GraphConv
import torch.nn as nn
import torch
class Classifier(nn.Module):
    def __init__(self, in_dim, out_dim):
        super(Classifier, self).__init__()
        self.conv1 = GraphConv(in_dim, out_dim,)
    def forward(self, g, h):
        # Apply graph convolution and activation.
        h = F.relu(self.conv1(g, h))
        return h
src_ids = torch.tensor([2, 3, 4])
dst_ids = torch.tensor([1, 2, 3])
device = torch.device('cuda:0')
g = dgl.graph((src_ids, dst_ids)).to(device)
g = dgl.add_self_loop(g)
x = torch.randn((5, 100)).to(device)
model = Classifier(100, 20).to(device)
model(g, x)
© www.soinside.com 2019 - 2024. All rights reserved.