如何解决 SLURM 的这些错误?

问题描述 投票:0回答:0
#SBATCH -J myjob                        # Job name
#SBATCH -o myjob.o%j                    # Name of stdout output file (%j corresponds to the job id)
#SBATCH -e myjob.e%j                    # Name of stderr error file (%j corresponds to the job id)
#SBATCH -p gpu-a100                   # Queue (partition) name
#SBATCH -N 1                            # Total # of nodes (must be 1 for serial)
#SBATCH -n 64                           # Number of cores
#SBATCH -t 24:00:00                     # Run time (hh:mm:ss)
#SBATCH [email protected]
#SBATCH --mail-type=all                 # Send email at begin and end of job (can assign begin or end as well)
#SBATCH -A CCR23005         # Allocation name (req'd if you have more than 1)

# Other commands must follow all #SBATCH directives...

cdw
python TypeT5/train_model.py

这是我的文件

commands.slurm
。执行
sbatch commands.slurm
后,我收到错误消息:Pushover: (Failed: Training spot model.) 您在 SLURM bash 脚本中设置了
--ntasks=64
,但不支持此变量。提示:使用 `--ntasks-per-node=64

但是,我没有在任何地方使用--ntasks=64。我该如何解决这个错误?

bash slurm
© www.soinside.com 2019 - 2024. All rights reserved.