How to efficient parallel matrix multiplication for small external indices and large internal indices in Fortran - fortran - SO中文参考