Pytorch:RuntimeError:CUDA 错误:在 CUDA 11.7 上触发设备端断言

问题描述 投票:0回答:0

我在同一环境中训练两个模型,其中一个在相同配置下工作正常,但另一个在没有任何进一步原因的情况下出错。我也添加了

CUDA_LAUNCH_BLOCKING=1
,但问题仍然存在。

脚本命令

CUDA_LAUNCH_BLOCKING=1 python3 train_pytorch.py sar_resnet31 --train_path /media/cvpr/CM_24/KR_2M/train --val_path /media/cvpr/CM_24/KR_2M/val --font "ko/NanumBarunGothicBold.ttf" --batch_size 56 --epochs 10

回溯

Validation set loaded in 0.0005977s (81 samples in 2 batches)
Train set loaded in 7.846s (3000000 samples in 53571 batches)
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [54,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [55,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [10,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [11,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [12,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [13,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [14,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [15,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [16,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [17,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [18,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [19,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [20,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [21,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [22,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [23,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [24,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [26,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [27,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [28,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [88,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "/media/cvpr/CM_1/doctr-main/references/recognition/train_pytorch.py", line 470, in <module>
    main(args)
  File "/media/cvpr/CM_1/doctr-main/references/recognition/train_pytorch.py", line 381, in main
    fit_one_epoch(model, train_loader, batch_transforms, optimizer, scheduler, mb, amp=args.amp)
  File "/media/cvpr/CM_1/doctr-main/references/recognition/train_pytorch.py", line 121, in fit_one_epoch
    train_loss = model(images, targets)["loss"]
  File "/home/cvpr/anaconda3/envs/pytesseract/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/media/cvpr/CM_1/doctr-main/doctr/models/recognition/sar/pytorch.py", line 242, in forward
    decoded_features = self.decoder(features, encoded, gt=None if target is None else gt)
  File "/home/cvpr/anaconda3/envs/pytesseract/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/media/cvpr/CM_1/doctr-main/doctr/models/recognition/sar/pytorch.py", line 119, in forward
    gt_embedding = self.embed_tgt(gt)
  File "/home/cvpr/anaconda3/envs/pytesseract/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cvpr/anaconda3/envs/pytesseract/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 158, in forward
    return F.embedding(
  File "/home/cvpr/anaconda3/envs/pytesseract/lib/python3.9/site-packages/torch/nn/functional.py", line 2044, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
pytorch cuda torch cudnn
© www.soinside.com 2019 - 2024. All rights reserved.