我正在尝试设置一个接口以在 Fortran 中使用 cublas.lib,而无需任何单独的 C 代码。我看过一些这样的例子,并尝试复制这些例子,但我遇到了麻烦。
这两个示例都适合我(cudart 和 cusolver)
https://forums.developer.nvidia.com/t/using-cusolverdn-in-fortran-code/39732/5
我有一个额外的包含目录 C:\Program Files\NVIDIA GPUComputing Toolkit\CUDA 12.2\lib\x64 以及 cublas.lib cusolver.lib cudart.lib 的额外依赖项。一切都编译得很好(因为我能够运行上面的示例。
当我运行下面的代码时,cublasCreate 返回 7 (CUBLAS_STATUS_INVALID_VALUE)
!==================================================================
!Interface to cusolverDn and CUDA C functions
!==================================================================
! C binding
! https://gcc.gnu.org/onlinedocs/gfortran/ISO_005fC_005fBINDING.html
!
! Similar CUDA examples
! https://stackoverflow.com/questions/27507169/find-available-graphics-card-memory-using-fortran%5B/url%5D
! https://forums.developer.nvidia.com/t/using-cusolverdn-in-fortran-code/39732/5
! https://stackoverflow.com/questions/22390812/returning-a-pointer-to-a-device-allocated-matrix-from-c-to-fortran
! https://stackoverflow.com/questions/35150748/mixed-language-cuda-programming
module cudaThings
interface
! cudaMalloc
integer (c_int) function cudaMalloc ( buffer, size ) bind (C, name="cudaMalloc" )
use iso_c_binding
implicit none
type (c_ptr) :: buffer
integer (c_size_t), value :: size
end function cudaMalloc
! cudaMemcpy
! A_mem_stat = cudaMemcpy(gpuPtr,cpuPtr,sizeof(ptr),cudaMemcpyHostToDevice)
! note: cudaMemcpyHostToDevice = 1
! note: cudaMemcpyDeviceToHost = 2
integer (c_int) function cudaMemcpy ( dst, src, count, kind ) bind (C, name="cudaMemcpy" )
use iso_c_binding
type (C_PTR), value :: dst, src
integer (c_size_t), value :: count, kind
end function cudaMemcpy
! cudaFree
integer (c_int) function cudaFree(buffer) bind(C, name="cudaFree")
use iso_c_binding
implicit none
type (C_PTR), value :: buffer
end function cudaFree
! get memory info
integer (c_int) function cudaMemGetInfo(fre, tot) bind(C, name="cudaMemGetInfo")
use iso_c_binding
implicit none
type(c_ptr),value :: fre
type(c_ptr),value :: tot
end function cudaMemGetInfo
integer(c_int) function cusolverDnCreate(cusolver_Hndl) bind(C,name="cusolverDnCreate")
use iso_c_binding
implicit none
type(c_ptr)::cusolver_Hndl
end function
integer(c_int) function cusolverDnDestroy(cusolver_Hndl) bind(C,name="cusolverDnDestroy")
use iso_c_binding
implicit none
type(c_ptr),value::cusolver_Hndl
end function
integer(c_int) function cublasCreate(cublas_Hndl) bind(C,name="cublasCreate_v2")
use iso_c_binding
implicit none
type(c_ptr),value::cublas_Hndl
end function
integer(c_int) function cublasDestroy(cublas_Hndl) bind(C,name="cublasDestroy_v2")
use iso_c_binding
implicit none
type(c_ptr),value::cublas_Hndl
end function
end interface
end module
program cudaTest
use iso_c_binding
use cudaThings
implicit none
! GPU stuff
type(c_ptr) :: cublas_Hndl
integer*4 :: cublas_stat
! get handle
cublas_stat = cublasCreate(cublas_Hndl)
write(*,*) cublas_stat
if (cublas_stat .ne. 0 ) then
write (*, '(A, I2)') " cublasCreate error: ", cublas_stat
stop
end if
end program
我使用的是 windows 10、intel fortran、cuda 12.2、带有 930M 显卡。
要了解发生了什么,有必要在编写接口之前分析底层 C 代码的工作原理。
在 C 中,正确的规范调用如下所示:
cublasHandle_t handle;
cublasStatus_t status = cublasCreate(&handle);
这是惯用的通过引用传递
cublasHandle_t
(本身是一个指向不透明结构的指针)(即使 C 没有显式的按引用传递语义)。
如果您这样做:
cublasHandle_t *handle;
cublasStatus_t status = cublasCreate(handle);
您正在向例程传递一个未初始化的指针,这应该会导致失败。我没有在 F2003 stype C 互操作方面做太多工作,但在我看来:
type(c_ptr) :: cublas_Hndl
integer*4 :: cublas_stat
! get handle
cublas_stat = cublasCreate(cublas_Hndl)
与理论上不工作的 C 版本相同,而:
type(c_ptr) :: cublas_Hndl
integer*4 :: cublas_stat
! get handle
cublas_stat = cublasCreate(c_loc(cublas_Hndl))
就像第一个可用的 C 版本,并且更有可能正常工作。