Slurm独立系统ubuntu 16.04.3(已编译)无法正常工作:身份验证

问题描述 投票:0回答:1

因此,我一直在与安装slurm进行斗争,并且确实感到茫然。 我的目标是在单台计算机上安装Slurm并从同一台计算机上提交作业。(通过sbatch或srun)

[最初,我尝试通过apt install slurm-llnl安装,但该版本落后于Ubuntu 16.04.3.。

因此,下一步是从源代码编译Slurm。下载并解压缩我运行过的tarball

./configure --prefix=/etc/init.d/ --sysconfdir=/etc/slurm-llnl/ make make install

然后我添加了以下/etc/ld.so.conf.d/SlurmLib.conf

/etc/init.d/lib
/etc/init.d/lib/slurm

然后我创建了cgroup.conf,slurm.conf和slurmdb.conf。

[cgroup.conf]

CgroupAutomount=yes
ConstrainCores=no
ConstrainRAMSpace=no

[slurm.conf]

ControlMachine=arroyavelab15
AuthType=auth/none
CryptoType=crypto/munge
MpiDefault=none
ProctrackType=proctrack/cgroup
ReturnToService=1
SlurmctldPidFile=/var/slurm_dir/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/slurm_dir/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/slurm_dir/spool/slurmd/
SlurmUser=danielsauceda
SlurmdUser=danielsauceda
StateSaveLocation=/var/slurm_dir/spool
SwitchType=switch/none
TaskPlugin=task/none
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/linear
AccountingStorageType=accounting_storage/none
AccountingStoreJobComment=YES
ClusterName=cluster
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=5
SlurmctldLogFile=/var/slurm_dir/slurmctld.log
SlurmdDebug=3
NodeName=arroyavelab15 NodeAddr=xxx.xxx.xxx.xxx.xx CPUs=1 CoresPerSocket=1 ThreadsPerCore=1 State=UNKNOWN RealMemory=8000
PartitionName=debug Nodes=arroyavelab15 Default=YES MaxTime=INFINITE State=UP

[slurmdb.conf]

# slurmDBD info                                                                           
DbdAddr=localhost
DbdHost=localhost
SlurmUser=danielsauceda
DebugLevel=4
PidFile=/var/run/slurmdbd.pid
#                                                                                         
# Database info                                                                           
StorageType=accounting_storage/mysql
StoragePass=slurm
StorageUser=slurm

最后等待之后

./slurmctld -D
./slurmd -D
./slurmdbd -Dv

它们似乎都在运行(在单独的终端中)

但是执行时

srun -N3 --nodes=1 --ntasks-per-node=1 hostname

我得到以下内容

srun: error: Couldn't find the specified plugin name for auth/munge looking at all files
srun: error: cannot find auth plugin for auth/munge
srun: error: cannot create auth context for auth/munge
srun: error: Couldn't find the specified plugin name for auth/munge looking at all files
srun: error: cannot find auth plugin for auth/munge
srun: error: cannot create auth context for auth/munge
srun: error: Couldn't find the specified plugin name for auth/munge looking at all files
srun: error: cannot find auth plugin for auth/munge
srun: error: cannot create auth context for auth/munge
srun: error: authentication: authentication initialization failure
srun: error: Srun communication socket apparently being written to by something other than Slurm
srun: error: Unable to allocate resources: Protocol authentication error

我不知道问题是什么,在线研究并没有太大帮助。

因此,我一直在与安装slurm进行斗争,并且确实感到茫然。我的目标是将Slurm安装在单台计算机上,然后从同一台计算机上提交作业。(通过sbatch或srun)...

linux authentication ubuntu-16.04 slurm
1个回答
0
投票

从软件包管理器安装munge,然后构建slurm --with-munge =选项,auth_munge.so应该出现在$ PREFIX / lib / slurm下

© www.soinside.com 2019 - 2024. All rights reserved.