#SBATCH -J myjob # Job name
#SBATCH -o myjob.o%j # Name of stdout output file (%j corresponds to the job id)
#SBATCH -e myjob.e%j # Name of stderr error file (%j corresponds to the job id)
#SBATCH -p gpu-a100 # Queue (partition) name
#SBATCH -N 1 # Total # of nodes (must be 1 for serial)
#SBATCH -n 64 # Number of cores
#SBATCH -t 24:00:00 # Run time (hh:mm:ss)
#SBATCH [email protected]
#SBATCH --mail-type=all # Send email at begin and end of job (can assign begin or end as well)
#SBATCH -A CCR23005 # Allocation name (req'd if you have more than 1)
# Other commands must follow all #SBATCH directives...
cdw
python TypeT5/train_model.py
这是我的文件
commands.slurm
。执行 sbatch commands.slurm
后,我收到错误消息:Pushover: (Failed: Training spot model.) 您在 SLURM bash 脚本中设置了 --ntasks=64
,但不支持此变量。提示:使用 `--ntasks-per-node=64
但是,我没有在任何地方使用--ntasks=64。我该如何解决这个错误?