CUDA 环境中通过外积进行的矩阵乘法与通过转置进行的矩阵乘法

问题描述 投票:0回答:0

向量外积的矩阵乘法的工作原理如下:

A = 
[1 2 3]
[4 5 6]
[7 8 9]

B=
[1 2 3]
[4 5 6]
[7 8 9]
C(1,1) = [1 2 3] ⨂ [1; 4; 7] = [1 2 3; 4 8 12; 7 14 21] * [1; 4; 7] = 30
C(1,2) = [1 2 3] ⨂ [2; 5; 8] = [1 2 3; 4 8 12; 7 14 21] * [2; 5; 8] = 36
C(1,3) = [1 2 3] ⨂ [3; 6; 9] = [1 2 3; 4 8 12; 7 14 21] * [3; 6; 9] = 42
C(2,1) = [4 5 6] ⨂ [1; 4; 7] = [4 5 6; 8 10 12; 14 16 18] * [1; 4; 7] = 66
C(2,2) = [4 5 6] ⨂ [2; 5; 8] = [4 5 6; 8 10 12; 14 16 18] * [2; 5; 8] = 81
C(2,3) = [4 5 6] ⨂ [3; 6; 9] = [4 5 6; 8 10 12; 14 16 18] * [3; 6; 9] = 96
C(3,1) = [7 8 9] ⨂ [1; 4; 7] = [7 8 9; 14 16 18; 21 24 27] * [1; 4; 7] = 102
C(3,2) = [7 8 9] ⨂ [2; 5; 8] = [7 8 9; 14 16 18; 21 24 27] * [2; 5; 8] = 126
C(3,3) = [7 8 9] ⨂ [3; 6; 9] = [7 8 9; 14 16 18; 21 24 27] * [3; 6; 9] = 150

因此,矩阵

A
B
的乘积为:

C = 
[30 36 42]
[66 81 96]
[102 126 150]

另一方面,转置矩阵乘法的工作原理如下:

要计算 A 和 B 的乘积,我们需要先转置矩阵 B,结果是:

B^T =
[1 4 7]
[2 5 8]
[3 6 9]

因此,A 和 B^T 的乘积将是一个 3x3 矩阵。

A * B^T =
[1*1 + 2*4 + 3*7  1*2 + 2*5 + 3*8  1*3 + 2*6 + 3*9]
[4*1 + 5*4 + 6*7  4*2 + 5*5 + 6*8  4*3 + 5*6 + 6*9]
[7*1 + 8*4 + 9*7  7*2 + 8*5 + 9*8  7*3 + 8*6 + 9*9]

化简各元素中的表达式,我们得到:

A * B^T =
[30  36  42]
[66  81  96]
[102 126 150]

因此,

A*B
是 3x3 矩阵:

[30  36  42]
[66  81  96]
[102 126 150]

在 CUDA 的背景下,外积矩阵乘法比转置矩阵乘法有什么优势吗?

c cuda matrix-multiplication
© www.soinside.com 2019 - 2024. All rights reserved.