OpenMPI和OpenFabrics注册物理内存警告

问题描述 投票:0回答:2

我用命令启动mpirun:

mpirun -np 2 prog

并得到下一个输出:

--------------------------------------------------------------------------
WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory. This can cause MPI jobs to
run with erratic performance, hang, and/or crash.

This may be caused by your OpenFabrics vendor limiting the amount of
physical memory that can be registered. You should investigate the
relevant Linux kernel module parameters that control how much physical
memory can be registered, and increase them to allow registering all
physical memory on your machine.

See this Open MPI FAQ item for more information on these Linux kernel module
parameters:

http://www.open-mpi.org/faq/?category=openfabrics#ib-..

Local host: node107
Registerable memory: 32768 MiB
Total memory: 65459 MiB

Your MPI job will continue, but may be behave poorly and/or hang.
--------------------------------------------------------------------------
hello from 0
hello from 1
[node107:48993] 1 more process has sent help message help-mpi- btl-openib.txt / reg mem limit low
[node107:48993] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

其他安装的软件(英特尔MPI库)工作正常,没有任何错误,并使用所有64GB内存。

对于OpenMPI,我不使用任何PBS管理器(Torque,slurm等),我在单节点上工作。我按命令到达节点

ssh node107

对于命令

cat /etc/security/limits.conf

我得到下一个输出:

...
* soft rss  2000000
* soft stack    2000000
* hard stack    unlimited
* soft data     unlimited
* hard data     unlimited
* soft memlock unlimited
* hard memlock unlimited
* soft nproc   10000
* hard nproc   10000
* soft nofile   10000
* hard nofile   10000
* hard cpu unlimited 
* soft cpu unlimited 
...

对于命令

cat /sys/module/mlx4_core/parameters/log_num_mtt

我得到输出:

0

命令:

cat /sys/module/mlx4_core/parameters/log_mtts_per_seg

输出:

3

命令:

getconf PAGESIZE

输出:

4096    

有了这个参数和公式

max_reg_mem = (2^log_num_mtt) * (2^log_mtts_per_seg) * PAGE_SIZE

max_reg_mem = 32768字节,也不是32GB,如何在openmpi警告中指定。

这是什么原因? openmpi可以不使用Mellanox和params log_num_mtt,log_mtts_per_seg?如何配置OpenFabrics以使用所有64GB内存?

linux openmpi system-administration
2个回答
0
投票

我通过安装最新版本的OpenMPI(2.0.2)解决了这个问题。


0
投票

在/etc/modprobe.d/mlx4_core.conf中,输入以下模块参数:

选项mlx4_core log_mtts_per_seg = 5

重新加载mlx4_core模块:

rmmod mlx4_ib; rmmod mlx4_core; modprobe mlx4_ib

检查log_mtts_per_seg是否按上面的配置显示:

cat / sys / module / mlx4_core / parameters / log_mtts_per_seg

© www.soinside.com 2019 - 2024. All rights reserved.