运行时错误：预期为_sm80 || is_sm90 为真，但结果为假

Question

我想运行 LLama 的本地 FineTuning。我遵循了 pytorch 博客文章中的 colab 笔记本 “使用 PyTorch 和 Hugging Face 生态系统中的工具在您自己的消费者硬件上进行 Finetune LLM”。

我一切都启动并运行了，但在训练中我遇到了运行时错误：

RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

我的理解是pytorch需要的特定版本。 sm_80 或 sm_90。 RTX 2080 ti 有 sm_75 的基本版本，但也有 sm_80 和 sm_90 标志。

我用

print(torch.__config__.show().replace("\n", "\n\t"))

检查了我的 RTX 2080 TI GPU 的配置统计信息，结果是这样的：

PyTorch built with:
      - GCC 9.3
      - C++ Version: 201703
      - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
      - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
      - OpenMP 201511 (a.k.a. OpenMP 4.5)
      - LAPACK is enabled (usually provided by MKL)
      - NNPACK is enabled
      - CPU capability usage: AVX2
      - CUDA Runtime 12.1
      - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
      - CuDNN 8.9.2
      - Magma 2.6.1
      - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

我该怎么做才能启用培训，或者卡不支持培训？

完整的错误报告在这里：

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[19], line 2
      1 ## start training
----> 2 trainer.train()

File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:323, in SFTTrainer.train(self, *args, **kwargs)
    320 if self.neftune_noise_alpha is not None and not self._trainer_supports_neftune:
    321     self.model = self._trl_activate_neftune(self.model)
--> 323 output = super().train(*args, **kwargs)
    325 # After training we make sure to retrieve back the original forward pass method
    326 # for the embedding layer by removing the forward post hook.
    327 if self.neftune_noise_alpha is not None and not self._trainer_supports_neftune:

File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/transformers/trainer.py:1539, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1537         hf_hub_utils.enable_progress_bars()
   1538 else:
-> 1539     return inner_training_loop(
   1540         args=args,
   1541         resume_from_checkpoint=resume_from_checkpoint,
   1542         trial=trial,
   1543         ignore_keys_for_eval=ignore_keys_for_eval,
   1544     )

File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/transformers/trainer.py:1869, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   1866     self.control = self.callback_handler.on_step_begin(args, self.state, self.control)
   1868 with self.accelerator.accumulate(model):
-> 1869     tr_loss_step = self.training_step(model, inputs)
   1871 if (
   1872     args.logging_nan_inf_filter
   1873     and not is_torch_tpu_available()
   1874     and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
   1875 ):
   1876     # if loss is nan or inf simply add the average of previous logged losses
   1877     tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)

File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/transformers/trainer.py:2777, in Trainer.training_step(self, model, inputs)
   2775         scaled_loss.backward()
   2776 else:
-> 2777     self.accelerator.backward(loss)
   2779 return loss.detach() / self.args.gradient_accumulation_steps

File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/accelerate/accelerator.py:1964, in Accelerator.backward(self, loss, **kwargs)
   1962     self.scaler.scale(loss).backward(**kwargs)
   1963 else:
-> 1964     loss.backward(**kwargs)

File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/torch/_tensor.py:492, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
    482 if has_torch_function_unary(self):
    483     return handle_torch_function(
    484         Tensor.backward,
    485         (self,),
   (...)
    490         inputs=inputs,
    491     )
--> 492 torch.autograd.backward(
    493     self, gradient, retain_graph, create_graph, inputs=inputs
    494 )

File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/torch/autograd/__init__.py:251, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    246     retain_graph = create_graph
    248 # The reason we repeat the same comment below is that
    249 # some Python versions print out the first line of a multi-line function
    250 # calls in the traceback and some print out the last line
--> 251 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    252     tensors,
    253     grad_tensors_,
    254     retain_graph,
    255     create_graph,
    256     inputs,
    257     allow_unreachable=True,
    258     accumulate_grad=True,
    259 )

File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/torch/autograd/function.py:288, in BackwardCFunction.apply(self, *args)
    282     raise RuntimeError(
    283         "Implementing both 'backward' and 'vjp' for a custom "
    284         "Function is not allowed. You should only implement one "
    285         "of them."
    286     )
    287 user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn
--> 288 return user_fn(self, *args)

File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/torch/utils/checkpoint.py:288, in CheckpointFunction.backward(ctx, *args)
    283 if len(outputs_with_grad) == 0:
    284     raise RuntimeError(
    285         "none of output has requires_grad=True,"
    286         " this checkpoint() is not necessary"
    287     )
--> 288 torch.autograd.backward(outputs_with_grad, args_with_grad)
    289 grads = tuple(
    290     inp.grad if isinstance(inp, torch.Tensor) else None
    291     for inp in detached_inputs
    292 )
    294 return (None, None) + grads

File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/torch/autograd/__init__.py:251, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    246     retain_graph = create_graph
    248 # The reason we repeat the same comment below is that
    249 # some Python versions print out the first line of a multi-line function
    250 # calls in the traceback and some print out the last line
--> 251 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    252     tensors,
    253     grad_tensors_,
    254     retain_graph,
    255     create_graph,
    256     inputs,
    257     allow_unreachable=True,
    258     accumulate_grad=True,
    259 )

RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

我尝试将pytorch升级到最新版本，重新启动PC，重新启动jupyter笔记本内核。

我还安装了最新的CUDA版本和nvidia工具包。

Answer 1

安装特定版本的 pytorch 解决了我的问题。

我用的是这个：

pip install --force-reinstall --pre torch --index-url https://download.pytorch.org/whl/nightly/cu117

来自 GitHub 问题评论：安装 nightly

感谢@palonix 对 GitHub 问题的提示！

运行时错误：预期为_sm80 || is_sm90 为真，但结果为假

问题描述投票：0回答：1

1个回答

最新问题

运行时错误：预期为_sm80 || is_sm90 为真，但结果为假

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1