如何解决 SLURM 的这些错误？

Question

#SBATCH -J myjob                        # Job name
#SBATCH -o myjob.o%j                    # Name of stdout output file (%j corresponds to the job id)
#SBATCH -e myjob.e%j                    # Name of stderr error file (%j corresponds to the job id)
#SBATCH -p gpu-a100                   # Queue (partition) name
#SBATCH -N 1                            # Total # of nodes (must be 1 for serial)
#SBATCH -n 64                           # Number of cores
#SBATCH -t 24:00:00                     # Run time (hh:mm:ss)
#SBATCH [email protected]
#SBATCH --mail-type=all                 # Send email at begin and end of job (can assign begin or end as well)
#SBATCH -A CCR23005         # Allocation name (req'd if you have more than 1)

# Other commands must follow all #SBATCH directives...

cdw
python TypeT5/train_model.py

这是我的文件

commands.slurm

。执行

sbatch commands.slurm

后，我收到错误消息：Pushover: (Failed: Training spot model.) 您在 SLURM bash 脚本中设置了

--ntasks=64

，但不支持此变量。提示：使用 `--ntasks-per-node=64

但是，我没有在任何地方使用--ntasks=64。我该如何解决这个错误？

如何解决 SLURM 的这些错误？

问题描述投票：0回答：0

最新问题

如何解决 SLURM 的这些错误？

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0