我正在尝试研究地球系统模型,对此我是新手。 目前,我只是尝试运行一个计算强度较小的测试用例。
我的系统是Ubuntu 20.04。我已按以下顺序构建了所需的库 - mpich、pnetcdf、zlib、hdf5、netcdf-c、netcdf-fortran、lapack 和 blas。版本如下(我的GCC和gfortran版本是9.4.0) mpich-3.3.1、pnetcdf-1.12.3、zlib-1.2.13、hdf5-1.10.5、netcdf-c-4.9.0、netcdf-fortran-4.6。 0、LAPACK 和BLAS 3.11。为了构建并行 I/O 支持,我在安装时遵循了 Pnetcdf、hdf5、Netcdf-c、最后 Netcdf-fortran 的顺序。所有库和软件包均已正确安装,没有任何错误,并且使用与我用于模型的相同编译器。
我现在遇到的问题与库的链接(pnetcdf、netcdf-c 和 netcdf-fortran)有关,更具体地说是顺序,正如该模型专用论坛所指出的那样。 在模型构建结束时,当它尝试创建单个可执行文件时,它会失败(collect2:错误:ld 返回 1 退出状态)。 以下是显示错误的命令
mpif90 -o /home/ubuntuvm/projects/cesm/scratch/testrun11/bld/cesm.exe \
cime_comp_mod.o cime_driver.o component_mod.o component_type_mod.o \
cplcomp_exchange_mod.o map_glc2lnd_mod.o map_lnd2glc_mod.o \
map_lnd2rof_irrig_mod.o mrg_mod.o prep_aoflux_mod.o prep_atm_mod.o \
prep_glc_mod.o prep_ice_mod.o prep_lnd_mod.o prep_ocn_mod.o \
prep_rof_mod.o prep_wav_mod.o seq_diag_mct.o seq_domain_mct.o \
seq_flux_mct.o seq_frac_mct.o seq_hist_mod.o seq_io_mod.o \
seq_map_mod.o seq_map_type_mod.o seq_rest_mod.o t_driver_timers_mod.o \
-L/home/ubuntuvm/projects/cesm/scratch/testrun11/bld/lib/ -latm \
-L/home/ubuntuvm/projects/cesm/scratch/testrun11/bld/lib/ -lice \
-L/home/ubuntuvm/projects/cesm/scratch/testrun11/bld/lib/ -llnd \
-L/home/ubuntuvm/projects/cesm/scratch/testrun11/bld/lib/ -locn \
-L/home/ubuntuvm/projects/cesm/scratch/testrun11/bld/lib/ -lrof \
-L/home/ubuntuvm/projects/cesm/scratch/testrun11/bld/lib/ -lglc \
-L/home/ubuntuvm/projects/cesm/scratch/testrun11/bld/lib/ -lwav \
-L/home/ubuntuvm/projects/cesm/scratch/testrun11/bld/lib/ -lesp \
-L../../gnu/mpich/nodebug/nothreads/mct/noesmf/c1a1l1i1o1r1g1w1e1/lib \
-lcsm_share -L../../gnu/mpich/nodebug/nothreads/lib -lpio -lgptl \
-lmct -lmpeu -L/home/ubuntuvm/CESM/lib -lnetcdff \
-Wl,-rpath=/home/ubuntuvm/CESM/lib -lnetcdf -lm -lnetcdf -lhdf5_hl \
-lhdf5 -lpnetcdf -ldl -lm -lz -Wl,-rpath=/home/ubuntuvm/CESM/lib \
-lpnetcdf -L/usr/local/lib -llapack -L/usr/local/lib -lblas \
-L/home/ubuntuvm/CESM/lib -lpnetcdf -L/home/ubuntuvm/CESM/lib
下面是部分错误其中libpio.a是在上述命令之前构建的组件库
/usr/bin/ld: ../../gnu/mpich/nodebug/nothreads/lib/libpio.a(nf_mod.F90.o): in function `__nf_mod_MOD_pio_copy_att':
nf_mod.F90:(.text+0x31): undefined reference to `nfmpi_copy_att'
/usr/bin/ld: ../../gnu/mpich/nodebug/nothreads/lib/libpio.a(nf_mod.F90.o): in function `__nf_mod_MOD_def_var_md':
nf_mod.F90:(.text+0x3b5): undefined reference to `nfmpi_def_var'
/usr/bin/ld: nf_mod.F90:(.text+0x4fe): undefined reference to `nfmpi_def_var'
/usr/bin/ld: ../../gnu/mpich/nodebug/nothreads/lib/libpio.a(nf_mod.F90.o): in function `__nf_mod_MOD_pio_def_dim':
nf_mod.F90:(.text+0xab9): undefined reference to `nfmpi_def_dim'
/usr/bin/ld: ../../gnu/mpich/nodebug/nothreads/lib/libpio.a(nf_mod.F90.o): in function `__nf_mod_MOD_pio_redef':
nf_mod.F90:(.text+0xeb9): undefined reference to `nfmpi_redef'
/usr/bin/ld: ../../gnu/mpich/nodebug/nothreads/lib/libpio.a(nf_mod.F90.o): in function `__nf_mod_MOD_pio_enddef':
nf_mod.F90:(.text+0xff0): undefined reference to `nfmpi_enddef'
/usr/bin/ld: ../../gnu/mpich/nodebug/nothreads/lib/libpio.a(nf_mod.F90.o): in function `__nf_mod_MOD_pio_inq_dimlen':
nf_mod.F90:(.text+0x115c): undefined reference to `nfmpi_inq_dimlen'
/usr/bin/ld: ../../gnu/mpich/nodebug/nothreads/lib/libpio.a(nf_mod.F90.o): in function `__nf_mod_MOD_pio_inq_dimname':
nf_mod.F90:(.text+0x14c2): undefined reference to `nfmpi_inq_dimname'
/usr/bin/ld: ../../gnu/mpich/nodebug/nothreads/lib/libpio.a(nf_mod.F90.o): in function `__nf_mod_MOD_pio_inq_dimid':
nf_mod.F90:(.text+0x1821): undefined reference to `nfmpi_inq_dimid'
/usr/bin/ld: ../../gnu/mpich/nodebug/nothreads/lib/libpio.a(nf_mod.F90.o): in function `__nf_mod_MOD_inq_varnatts_vid':
nf_mod.F90:(.text+0x1c24): undefined reference to `nfmpi_inq_varnatts'
/usr/bin/ld: ../../gnu/mpich/nodebug/nothreads/lib/libpio.a(nf_mod.F90.o): in function `__nf_mod_MOD_inq_vardimid_vid':
nf_mod.F90:(.text+0x1fcc): undefined reference to `nfmpi_inq_vardimid'
库链接如下
-L/home/ubuntuvm/CESM_Library/lib -lnetcdff -lnetcdf \
-Wl,-rpath=/home/ubuntuvm/CESM_Library/lib -lnetcdf -lm -lnetcdf -lhdf5_hl \
-lhdf5 -lpnetcdf -ldl -lm -lz -Wl,-rpath=/home/ubuntuvm/CESM_Library/lib -lpnetcdf
我在这里做错了什么?我将不胜感激有关图书馆顺序的任何建议,并很乐意提供可能需要的任何其他详细信息。
虽然很难重现这个特定的错误。看起来问题出在 PNetCDF 库上,它似乎不包含一些 FORTRAN 函数。
这里是从头开始在 Ubuntu 上使用 OpenMPI 设置 CESM 的简要说明。此示例适用于 Ubuntu 22.04 (Jammy),但应该适用于所有最新的 Ubuntu 版本。
假设主机名是
ubuntu-jammy
。
apt-get install -y gcc g++ gfortran build-essential cmake git \
subversion python-is-python3 perl vim python3-pip libxml2-utils unzip
-j
的 make
选项的参数。echo "ubuntu-jammy slots=8" >> /etc/openmpi/opempi-default-hostfile
mkdir CESM
cd CESM
RUN git clone -b release-cesm2.1.3 https://github.com/ESCOMP/CESM.git my_cesm_sandbox
mkdir ~/.cime
export CIME_MODEL=cesm
mkdir -p cesm/inputdata
my_cesm_sandbox/manage_externals/manic/repository_svn.py
以添加忽略未知 CA 的选项。这可能会引入漏洞,因此请三思。
让 SVN 命令看起来像这样:cmd = ['svn', 'checkout', '--non-interactive', '--trust-server-cert-failures=unknown-ca', '--quiet', url, repo_dir_path]
./manage_externals/checkout_externals
使用
./manage_externals/checkout_externals -S
检查是否已交付所有必要的代码。
$HOME/CESM/lib
下。CESM_LIB_DIR=$HOME/CESM/lib
ZLIB=$CESM_LIB_DIR/zlib
HDF5=$CESM_LIB_DIR/hdf5
NETCDF=$CESM_LIB_DIR/netcdf
PNETCDF=$CESM_LIB_DIR/pnetcdf
下载并解压 zlib、hdf5、netcdf-c、netcdf-fortran、pnetcdf、lapack
要构建 zlib,请使用默认设置: 在包含 zlib 源的目录中
./configure --prefix=$ZLIB
make -j 8
make check
make install
CPPFLAGS="-I$ZLIB/include" LDFLAGS="-L$ZLIB/lib" \
CC=mpicc CXX=mpicxx ./configure --prefix=$HDF5 --with-zlib=$ZLIB --enable-hl --enable-fortran --enable-parallel
配置后检查并行支持是否已启用:
SUMMARY OF THE HDF5 CONFIGURATION
=================================
...
Features:
---------
Parallel HDF5: yes
Parallel Filtered Dataset Writes: yes
Large Parallel I/O: yes
High-level library: yes
...
现在构建、检查和安装。
make -j8
make -j8 check
make install
CPPFLAGS="-I$HDF5/include -I$ZLIB/include" LDFLAGS="-L$HDF5/lib -L$ZLIB/lib" \
CC=mpicc CXX=mpicxx ./configure --prefix=$NETCDF --disable-dap --enable-parallel4
确保启用了正确的选项
# NetCDF C Configuration Summary
==============================
...
HDF5 Support: yes
NetCDF-4 API: yes
NC-4 Parallel Support: yes
...
现在制作、检查并安装
make -j8
make -j8 check
make install
CPPFLAGS="-I$NETCDF/include -I$HDF5/include -I$ZLIB/include" \
FFLAGS="-fallow-argument-mismatch -fallow-invalid-boz" \
LDFLAGS="-L$NETCDF/lib -L$HDF5/lib -L$ZLIB/lib" LD_LIBRARY_PATH="$NETCDF/lib:$LD_LIBRARY_PATH" \
CC=mpicc CXX=mpicxx FC=mpifort ./configure --prefix=$NETCDF
检查并行选项是否配置如下:
# NetCDF Fortran Configuration Summary
==============================
...
Parallel IO: yes
NetCDF4 Parallel IO: yes
PnetCDF Parallel IO: no
...
现在构建、检查和安装
make -j8
make -j8 check
make install
CPPFLAGS="-I$NETCDF/include -I$HDF5/include -I$ZLIB/include" \
FFLAGS="-fallow-argument-mismatch -fallow-invalid-boz" \
LDFLAGS="-L$NETCDF/lib -L$HDF5/lib -L$ZLIB/lib" \
LD_LIBRARY_PATH="$NETCDF/lib:$LD_LIBRARY_PATH" \
CC=mpicc CXX=mpicxx FC=mpifort ./configure --prefix=$PNETCDF --enable-shared --enable-fortran --enable-profiling --enable-large-file-test --with-netcdf4
奔跑
make -j8
make -j8 tests
make check
make ptest
make ptests
make install
虽然
ptest
在 4 个处理器上运行,但您至少应该有这个数量。
如果由于您的系统没有足够的处理器而导致某些 ptest 失败,请不要担心,因为某些测试需要 10 个处理器用于 MPI 或更多。
<?xml version="1.0"?>
<config_machines version="2.0">
<machine MACH="ubuntu-jammy">
<DESC>
Example port to Ubuntu Jammy linux system with gcc, netcdf, pnetcdf and openmpi
</DESC>
<NODENAME_REGEX>ubuntu-jammy</NODENAME_REGEX>
<OS>LINUX</OS>
<COMPILERS>gnu</COMPILERS>
<MPILIBS>openmpi</MPILIBS>
<PROJECT>none</PROJECT>
<SAVE_TIMING_DIR> </SAVE_TIMING_DIR>
<CIME_OUTPUT_ROOT>$ENV{HOME}/cesm/scratch</CIME_OUTPUT_ROOT>
<DIN_LOC_ROOT>$ENV{HOME}/cesm/inputdata</DIN_LOC_ROOT>
<DIN_LOC_ROOT_CLMFORC>$ENV{HOME}/cesm/inputdata/lmwg</DIN_LOC_ROOT_CLMFORC>
<DOUT_S_ROOT>$ENV{HOME}/cesm/archive/$CASE</DOUT_S_ROOT>
<BASELINE_ROOT>$ENV{HOME}/cesm/cesm_baselines</BASELINE_ROOT>
<CCSM_CPRNC>$ENV{HOME}/cesm/tools/cime/tools/cprnc/cprnc</CCSM_CPRNC>
<GMAKE>make</GMAKE>
<GMAKE_J>8</GMAKE_J>
<BATCH_SYSTEM>none</BATCH_SYSTEM>
<SUPPORTED_BY>[email protected]</SUPPORTED_BY>
<MAX_TASKS_PER_NODE>8</MAX_TASKS_PER_NODE>
<MAX_MPITASKS_PER_NODE>8</MAX_MPITASKS_PER_NODE>
<PROJECT_REQUIRED>FALSE</PROJECT_REQUIRED>
<mpirun mpilib="default">
<executable>mpiexec</executable>
<arguments>
<arg name="ntasks"> -np {{ total_tasks }} </arg>
</arguments>
</mpirun>
<module_system type="none" allow_error="true">
</module_system>
<environment_variables>
<env name="NETCDF">$ENV{HOME}/CESM/lib/necdf</env>
<env name="PNETCDF">$ENV{HOME}/CESM/lib/pnetcdf</env>
<env name="OMP_STACKSIZE">256M</env>
</environment_variables>
<resource_limits>
<resource name="RLIMIT_STACK">-1</resource>
</resource_limits>
</machine>
</config_machines>
验证 XML 文件
xmllint --noout --schema $HOME/CESM/my_cesm_sandbox/cime/config/xml_schemas/config_machines.xsd $HOME/.cime/config_machines.xml
.cime/config_compilers.xml
<?xml version="1.0" encoding="UTF-8"?>
<config_compilers version="2.0">
<compiler>
<LDFLAGS>
<append compile_threaded="true"> -fopenmp </append>
</LDFLAGS>
<FFLAGS>
<append> -fallow-argument-mismatch -fallow-invalid-boz</append>
</FFLAGS>
<SFC>gfortran</SFC>
<SCC>gcc</SCC>
<SCXX>g++</SCXX>
<MPIFC>mpifort</MPIFC>
<MPICC>mpicc</MPICC>
<MPICXX>mpicxx</MPICXX>
<CXX_LINKER>FORTRAN</CXX_LINKER>
<NETCDF_PATH>$ENV{HOME}/CESM/lib/nectdf</NETCDF_PATH>
<PNETCDF_PATH>$ENV{HOME}/CESM/lib/pnectdf</PNETCDF_PATH>
<SLIBS>
<append>-L $ENV{HOME}/CESM/lib/nectdf/lib -lnetcdff -lnetcdf -lm</append>
<append>-L $ENV{HOME}/CESM/lib/lapack -llapack -lblas</append>
</SLIBS>
</compiler>
</config_compilers>
验证 xml 文件
xmllint --noout --schema $HOME/CESM/my_cesm_sandbox/cime/config/xml_schemas/config_compilers_v2.xsd $HOME/.cime/config_compilers.xml
cd $HOME/CESM/my_cesm_sandbox
cime/scripts/create_newcase --case mycase --compset X --res f19_g16
cd mycase
./case.setup
./case.build