在 NVIDIA-GPU 上使用 DPC++ blas 库的 gemm 函数时出现分段错误错误

问题描述 投票:0回答:1

我在公共集群上安装了oneAPI基础套件和HPC套件(2024.0)来测试gemm的性能。 但我遇到了分段错误错误。我不知道如何解决这个问题。

我使用离线安装程序并在本地安装。 我还按照以下网页中的说明来配置 nvidia gpu。 https://developer.codeplay.com/products/oneapi/nvidia/2024.0.0/guides/get-started-guide-nvidia.html#dpc-resources

这是结果。

SYCL_PI_TRACE\[basic\]: Plugin found and successfully loaded: libpi_cuda.so \[ PluginVersion: 14.38.1 \]
SYCL_PI_TRACE\[basic\]: Plugin found and successfully loaded: libpi_unified_runtime.so \[ PluginVersion: 14.37.1 \]
SYCL_PI_TRACE\[all\]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE\[all\]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE\[all\]: Selected device: -\> final score = 1500
SYCL_PI_TRACE\[all\]:   platform: NVIDIA CUDA BACKEND
SYCL_PI_TRACE\[all\]:   device: Tesla V100-PCIE-16GB
The results are correct!

我使用以下选项编译了以下 github 链接中提供的测试程序。 https://github.com/oneapi-src/oneMKL/blob/89cfda5c360b34a21f280ae11ecc00abd8e350f4/examples/blas/run_time_dispatching/level3/gemm_usm.cpp

icpx -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_70 gemm_usm.cpp -o dpcpp_dgemm_v100 -Wl,-rpath=/home01/ r907a03/intel/oneapi/mkl/2024.0/lib /home01/r907a03/intel/oneapi/mkl/2024.0/lib/libmkl_sycl_blas.so /home01/r907a03/intel/oneapi/mkl/2024.0/lib/libmkl_intel_lp64.so /home01/ r907a03/intel/oneapi/mkl/2024.0/lib/libmkl_tbb_thread.so /home01/r907a03/intel/oneapi/mkl/2024.0/lib/libmkl_core.so /home01/r907a03/intel/oneapi/tbb/2021.11/lib/libtbb。所以.12

我还使用了以下选项来自动检测GPU设备。

导出 ONEAPI_DEVICE_SELECTOR="ext_oneapi_cuda:*"

这是结果。

########################################################################
# General Matrix-Matrix Multiplication using Unified Shared Memory Example:
#
# C = alpha * A * B + beta * C
#
# where A, B and C are general dense matrices and alpha, beta are
# floating point type precision scalars.
#
# Using apis:
#   gemm
#
# Using single precision (float) data type
#
# Device will be selected during runtime.
# The environment variable SYCL_DEVICE_FILTER can be used to specify
# SYCL device
#
########################################################################

Running BLAS GEMM USM example on GPU device.
Device name is: Tesla V100-PCIE-16GB
Running with single precision real data type:
Segmentation fault

我不知道如何解决这个问题。

我还尝试设置“export LIBOMPTARGET_PLUGIN=OPENCL” 它也不起作用。 https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-examples-segmentation-fault/td-p/1213659

我还尝试了其他gemm示例,它也不起作用。 https://github.com/oneapi-src/oneAPI-samples/tree/master/Libraries/oneMKL/matrix_mul_mkl 为了强制程序选择 GPU 设备,我使用了

queue Q( gpu_selector_v ); 


这是系统信息

CentOS Linux 版本 7.9.2009(核心)

海湾合作委员会版本12.2.0

CUDA版本12.1

$ nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:18:00.0 Off |                    0 |
| N/A   27C    P0    25W / 250W |      4MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  Off  | 00000000:AF:00.0 Off |                    0 |
| N/A   29C    P0    27W / 250W |      4MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.10.0.17_160000]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
[opencl:cpu:2] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz OpenCL 3.0 (Build 0) [2021.12.9.0.24_005321]
[ext_oneapi_cuda:gpu:0] NVIDIA CUDA BACKEND, Tesla V100-PCIE-16GB 7.0 [CUDA 11.6]
[ext_oneapi_cuda:gpu:1] NVIDIA CUDA BACKEND, Tesla V100-PCIE-16GB 7.0 [CUDA 11.6]
nvidia intel-oneapi sycl dpc++
1个回答
0
投票

我想知道

libmkl_sycl_blas.so
从哪里来?它可能是一个旧版本,因为您要链接的 SYCL 库已更改为 portBLAS,因此如果您想将 portBLAS 后端与 oneMKL 一起使用,您应该使用具有此名称的库
libonemkl_blas_portblas.so

我也从未想过使用

--offload-arch=sm_70
而是使用
--cuda-gpu-arch=sm_70

© www.soinside.com 2019 - 2024. All rights reserved.