Slurm 每个节点仅运行 1 个作业

问题描述 投票:0回答:1

我正在构建一个新的slurm集群,我不太熟悉资源如何分配。我有 4 个节点,每个节点有 32 个核心。当我提交作业时,每个节点仅运行 1 个作业,其余作业处于待处理状态。

所有作业都应该是单线程的,并且只占用一个核心。我怎样才能让其他作业运行?每个节点应该能够同时运行 32 个。以下是 squeue 的输出:

         JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           249  computes freesurf     nidb PD       0:00      1 (Resources)
           250  computes freesurf     nidb PD       0:00      1 (Priority)
           251  computes freesurf     nidb PD       0:00      1 (Priority)
           252  computes freesurf     nidb PD       0:00      1 (Priority)
           253  computes freesurf     nidb PD       0:00      1 (Priority)
           254  computes freesurf     nidb PD       0:00      1 (Priority)
           255  computes freesurf     nidb PD       0:00      1 (Priority)
           256  computes freesurf     nidb PD       0:00      1 (Priority)
           257  computes freesurf     nidb PD       0:00      1 (Priority)
           258  computes freesurf     nidb PD       0:00      1 (Priority)
           259  computes freesurf     nidb PD       0:00      1 (Priority)
           260  computes freesurf     nidb PD       0:00      1 (Priority)
           261  computes freesurf     nidb PD       0:00      1 (Priority)
           262  computes freesurf     nidb PD       0:00      1 (Priority)
           263  computes freesurf     nidb PD       0:00      1 (Priority)
           245  computes freesurf     nidb  R       8:00      1 compute60
           246  computes freesurf     nidb  R       7:40      1 compute61
           247  computes freesurf     nidb  R       7:19      1 compute62
           248  computes freesurf     nidb  R       6:55      1 compute63

以及 slurm.conf 中的节点和分区

NodeName=compute60 NodeAddr=10.35.10.110 CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16 ThreadsPerCore=2 RealMemory=515827 State=UNKNOWN
NodeName=compute61 NodeAddr=10.35.10.111 CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16 ThreadsPerCore=2 RealMemory=515827 State=UNKNOWN
NodeName=compute62 NodeAddr=10.35.10.112 CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16 ThreadsPerCore=2 RealMemory=515827 State=UNKNOWN
NodeName=compute63 NodeAddr=10.35.10.113 CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16 ThreadsPerCore=2 RealMemory=515827 State=UNKNOWN

PartitionName=computes Nodes=compute60,compute61,compute62,compute63 Default=NO MaxTime=INFINITE State=UP
cluster-computing slurm
1个回答
0
投票

确保 SelectType 选项在

select/cons_tres
配置文件中具有值
slurm.conf
。请参阅有关 Slurm 中的消耗资源

的文档页面中的更多信息
© www.soinside.com 2019 - 2024. All rights reserved.