我在公共集群上安装了oneAPI基础套件和HPC套件(2024.0)来测试gemm的性能。 但我遇到了分段错误错误。我不知道如何解决这个问题。
我使用离线安装程序并在本地安装。 我还按照以下网页中的说明来配置 nvidia gpu。 https://developer.codeplay.com/products/oneapi/nvidia/2024.0.0/guides/get-started-guide-nvidia.html#dpc-resources
这是结果。
SYCL_PI_TRACE\[basic\]: Plugin found and successfully loaded: libpi_cuda.so \[ PluginVersion: 14.38.1 \]
SYCL_PI_TRACE\[basic\]: Plugin found and successfully loaded: libpi_unified_runtime.so \[ PluginVersion: 14.37.1 \]
SYCL_PI_TRACE\[all\]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE\[all\]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE\[all\]: Selected device: -\> final score = 1500
SYCL_PI_TRACE\[all\]: platform: NVIDIA CUDA BACKEND
SYCL_PI_TRACE\[all\]: device: Tesla V100-PCIE-16GB
The results are correct!
我使用以下选项编译了以下 github 链接中提供的测试程序。 https://github.com/oneapi-src/oneMKL/blob/89cfda5c360b34a21f280ae11ecc00abd8e350f4/examples/blas/run_time_dispatching/level3/gemm_usm.cpp
icpx -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_70 gemm_usm.cpp -o dpcpp_dgemm_v100 -Wl,-rpath=/home01/ r907a03/intel/oneapi/mkl/2024.0/lib /home01/r907a03/intel/oneapi/mkl/2024.0/lib/libmkl_sycl_blas.so /home01/r907a03/intel/oneapi/mkl/2024.0/lib/libmkl_intel_lp64.so /home01/ r907a03/intel/oneapi/mkl/2024.0/lib/libmkl_tbb_thread.so /home01/r907a03/intel/oneapi/mkl/2024.0/lib/libmkl_core.so /home01/r907a03/intel/oneapi/tbb/2021.11/lib/libtbb。所以.12
我还使用了以下选项来自动检测GPU设备。
导出 ONEAPI_DEVICE_SELECTOR="ext_oneapi_cuda:*"
这是结果。
########################################################################
# General Matrix-Matrix Multiplication using Unified Shared Memory Example:
#
# C = alpha * A * B + beta * C
#
# where A, B and C are general dense matrices and alpha, beta are
# floating point type precision scalars.
#
# Using apis:
# gemm
#
# Using single precision (float) data type
#
# Device will be selected during runtime.
# The environment variable SYCL_DEVICE_FILTER can be used to specify
# SYCL device
#
########################################################################
Running BLAS GEMM USM example on GPU device.
Device name is: Tesla V100-PCIE-16GB
Running with single precision real data type:
Segmentation fault
我不知道如何解决这个问题。
我还尝试设置“export LIBOMPTARGET_PLUGIN=OPENCL” 它也不起作用。 https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-examples-segmentation-fault/td-p/1213659
我还尝试了其他gemm示例,它也不起作用。 https://github.com/oneapi-src/oneAPI-samples/tree/master/Libraries/oneMKL/matrix_mul_mkl 为了强制程序选择 GPU 设备,我使用了
queue Q( gpu_selector_v );
这是系统信息
CentOS Linux 版本 7.9.2009(核心)
海湾合作委员会版本12.2.0
CUDA版本12.1
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:18:00.0 Off | 0 |
| N/A 27C P0 25W / 250W | 4MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... Off | 00000000:AF:00.0 Off | 0 |
| N/A 29C P0 27W / 250W | 4MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
[opencl:cpu:2] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz OpenCL 3.0 (Build 0) [2021.12.9.0.24_005321]
[ext_oneapi_cuda:gpu:0] NVIDIA CUDA BACKEND, Tesla V100-PCIE-16GB 7.0 [CUDA 11.6]
[ext_oneapi_cuda:gpu:1] NVIDIA CUDA BACKEND, Tesla V100-PCIE-16GB 7.0 [CUDA 11.6]
我想知道
libmkl_sycl_blas.so
从哪里来?它可能是一个旧版本,因为您要链接的 SYCL 库已更改为 portBLAS,因此如果您想将 portBLAS 后端与 oneMKL 一起使用,您应该使用具有此名称的库 libonemkl_blas_portblas.so
。
我也从未想过使用
--offload-arch=sm_70
而是使用--cuda-gpu-arch=sm_70
。