在Linux的databricks集群上运行Open MPI时出现 "没有足够的插槽 "的错误。

问题描述 投票:-1回答:1

我试图使用mpi在databricks集群上运行一个C应用程序。

我已经从以下网站下载了Open MPI https:/download.open-mpi.orgreleaseopen-mpiv4.0openmpi-4.0.3.tar.gz。

并安装在databricks集群上。

它是在databricks集群上用Ubuntu构建的。

  Operating system/version: Linux 4.4.0 Ubuntu
  Computer hardware: x86_64
  Network type: databricks

我试图在databricks上用python笔记本运行。

%sh
mpirun --allow-run-as-root -np 20  MY_c_Application

MY_c_Application是由C语言编写的,在databricks Linux上编译。

我的databricks集群有21个节点,其中一个是驱动。每个节点有32个核心。

当我运行上述命令时,我得到了如下的错误信息。

请你告诉我这是怎么造成的? 或者,我错过了什么?

谢谢


There are not enough slots available in the system to satisfy the 20
slots that were requested by the application:

   MY_c_application

 Either request fewer slots for your application, or make more slots available for use.

 A "slot" is the Open MPI term for an allocatable unit where we can launch a process. 

 The number of slots available are defined by the environment in which Open MPI processes are run:

 Hostfile, via "slots=N" clauses (N defaults to number of processor cores if not provided)

  The --host command line parameter, via a ":N" suffix on the hostname
  (N defaults to 1 if not provided)

 Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)

  If none of a hostfile, the --host command line parameter, or an RM 
  is present, Open MPI defaults to the number of processor cores In 
  all the above cases, if you want Open MPI to default to the number 
  of hardware threads instead of the number of processor cores, use 
  the --use-hwthread-cpus option.

 Alternatively, you can use the --oversubscribe option to ignore the 
 number of available slots when deciding the number of processes to launch.

更新

添加一个hostfile后,这个问题就没有了。

 sudo mpirun --allow-run-as-root -np 25 --hostfile my_hostfile ./MY_C_APP 

谢谢

mpi openmpi azure-databricks
1个回答
0
投票

按照原发帖人的回答分享一下。

添加hostfile后,问题解决。

 sudo mpirun --allow-run-as-root -np 25 --hostfile my_hostfile ./MY_C_APP 
© www.soinside.com 2019 - 2024. All rights reserved.