缓存友好的方阵转置逻辑问题

问题描述 投票:0回答:1

此代码是一种转置算法,专门针对行和列均为 8 的倍数的方阵进行了优化。优化涉及稍后处理满足某些条件的组件,这些条件已在代码中注释掉。然而,转置在错误的地方失败了。

void block8x4_opt(int M, int N, int A[N][M], int B[M][N])
{
    register int i, j, i1, j1;
    for (i = 0; i < N; i += 8)
        for (j = 0; j < M; j += 4) {
            if (j == i || j == i + 4) { /* same block */
                for (i1 = i; i1 < i + 8 ; i1++)
                    for (j1 = j; j1 < j + 4 ; j1++) {
                        if (i1 == j1 || i1 == j1 + 4) /* same row */
                            continue;
                        B[j1][i1] = A[i1][j1];
                    }
                for (j1 = j; j1 < j + 4; j1++) {
                    B[j1][j1] = A[j1][j1]; /* i1 == j1 */
                    B[j1][j1+4] = A[j1+4][j1]; /* i1 == j1 + 4 */
                }
                continue;   
            }
            for (i1 = i; i1 < i + 8 ; i1++)
                for (j1 = j; j1 < j + 4 ; j1++)
                    B[j1][i1] = A[i1][j1];
        }
}

下面是使用上面的代码将 8x8 矩阵 A 转置为 B 的结果。

问题出现在(0,5),(1,6),(2,7)处。

奇怪的是,我用上面的算法跳过的部分是该部分的顶部,即第 4-13-22-31 行。正是它下面的那一行导致了问题,其他块不会导致任何问题。

*************** matrix A *****************
         0    1    2    3    4    5    6    7
        ...  ...  ...  ...  ...  ...  ...  ...
  0:     0    1    2    3    4    5    6    7
  1:     8    9   10   11   12   13   14   15
  2:    16   17   18   19   20   21   22   23
  3:    24   25   26   27   28   29   30   31
  4:    32   33   34   35   36   37   38   39
  5:    40   41   42   43   44   45   46   47
  6:    48   49   50   51   52   53   54   55
  7:    56   57   58   59   60   61   62   63

              **** matrix B ****
         0    1    2    3    4    5    6    7
        ...  ...  ...  ...  ...  ...  ...  ...
  0:     0    8   16   24   32   40   48   56
  1:     1    9   17   25   33   41   49   57
  2:     2   10   18   26   34   42   50   58
  3:     3   11   19   27   35   43   51   59
  4:     4   12   20   28   36   44   52   60
  5:     ?   13   21   29   37   45   53   61
  6:     6    0   22   30   38   46   54   62
  7:     7   15    ?   31   39   47   55   63

如果您能找到我的代码的问题,我将非常感激。

c algorithm matrix caching transpose
1个回答
0
投票

看这个:

for (i = 0; i < N; i += 8)
    for (j = 0; j < M; j += 4) {
        if (j == i || j == i + 4) { /* same block */

对于 8x8 矩阵,

N
M
都是 8,所以就像:

for (i = 0; i < 8; i += 8)
    for (j = 0; j < 8; j += 4) {
        if (j == i || j == i + 4) { /* same block */

那么

i
j
将采用哪些值:

i==0, j==0
i==0, j==4
// Now the inner loop ends as j becomes 8 due j+=4
// Now the outer loop ends as i becomes 8 due i+=8

So the `if` statement will only be executed for (i,j) as (0,0) and (0,4). If both cases the `if` condition will be true. As the last statement in the body of the `if` is a `continue`, the code will never reach:

        for (i1 = i; i1 < i + 8 ; i1++)
            for (j1 = j; j1 < j + 4 ; j1++)
                B[j1][i1] = A[i1][j1];

换句话说 - 你的算法是错误的。

© www.soinside.com 2019 - 2024. All rights reserved.