向量外积的矩阵乘法的工作原理如下:
A =
[1 2 3]
[4 5 6]
[7 8 9]
B=
[1 2 3]
[4 5 6]
[7 8 9]
C(1,1) = [1 2 3] ⨂ [1; 4; 7] = [1 2 3; 4 8 12; 7 14 21] * [1; 4; 7] = 30
C(1,2) = [1 2 3] ⨂ [2; 5; 8] = [1 2 3; 4 8 12; 7 14 21] * [2; 5; 8] = 36
C(1,3) = [1 2 3] ⨂ [3; 6; 9] = [1 2 3; 4 8 12; 7 14 21] * [3; 6; 9] = 42
C(2,1) = [4 5 6] ⨂ [1; 4; 7] = [4 5 6; 8 10 12; 14 16 18] * [1; 4; 7] = 66
C(2,2) = [4 5 6] ⨂ [2; 5; 8] = [4 5 6; 8 10 12; 14 16 18] * [2; 5; 8] = 81
C(2,3) = [4 5 6] ⨂ [3; 6; 9] = [4 5 6; 8 10 12; 14 16 18] * [3; 6; 9] = 96
C(3,1) = [7 8 9] ⨂ [1; 4; 7] = [7 8 9; 14 16 18; 21 24 27] * [1; 4; 7] = 102
C(3,2) = [7 8 9] ⨂ [2; 5; 8] = [7 8 9; 14 16 18; 21 24 27] * [2; 5; 8] = 126
C(3,3) = [7 8 9] ⨂ [3; 6; 9] = [7 8 9; 14 16 18; 21 24 27] * [3; 6; 9] = 150
因此,矩阵
A
和 B
的乘积为:
C =
[30 36 42]
[66 81 96]
[102 126 150]
另一方面,转置矩阵乘法的工作原理如下:
要计算 A 和 B 的乘积,我们需要先转置矩阵 B,结果是:
B^T =
[1 4 7]
[2 5 8]
[3 6 9]
因此,A 和 B^T 的乘积将是一个 3x3 矩阵。
A * B^T =
[1*1 + 2*4 + 3*7 1*2 + 2*5 + 3*8 1*3 + 2*6 + 3*9]
[4*1 + 5*4 + 6*7 4*2 + 5*5 + 6*8 4*3 + 5*6 + 6*9]
[7*1 + 8*4 + 9*7 7*2 + 8*5 + 9*8 7*3 + 8*6 + 9*9]
化简各元素中的表达式,我们得到:
A * B^T =
[30 36 42]
[66 81 96]
[102 126 150]
因此,
A*B
是 3x3 矩阵:
[30 36 42]
[66 81 96]
[102 126 150]
在 CUDA 的背景下,外积矩阵乘法比转置矩阵乘法有什么优势吗?