阵列查找与bitshift的性能

问题描述 投票:1回答:2

对于Othello(黑白棋)的实现,我预先计算了两个在数组中查找的幂。今天,在尝试优化速度的同时,这让我感到不成熟,因为我之前认为查找速度更快,而且从未实际成为基准。

这是我如何预先计算两个人的权力:

private static final long[] POWERS_OF_TWO = LongStream.range(0L, NUM_SQUARES)
                                                      .map(l -> 1L << l)
                                                      .toArray();

这就是呼叫网站的外观

final long newSelf = self | cur | POWERS_OF_TWO[i];

现在,替代方案是直接计算呼叫站点的两个功率:

final long newSelf = self | cur | (1L << i);

哪种方法更快?

java arrays performance bitwise-operators
2个回答
2
投票

您需要一个比答案更好的测试。

我们来看看循环代码:

for (int i = 0; i < N; i++) {
    final int idx = i % NUM_SQUARES;
    final long value = 1L << idx;
    sum += value;
}

移位是一个非常快的cpu操作 - 它应该至少与+++=一样快。因此,如果我们查看代码,我们有循环计数器(i++),sum +=和分支(i < N)。 %可能需要10到20倍的cpu周期。

因此,为了更好地测试和移位操作的时间,我将删除%并以随机顺序(对于数组)获得循环内的所有64个幂。像这样的东西:

for (int i = 0; i < N; i++) {
    sum += 1L << 4;
    sum += 1L << 7;
    sum += 1L << 1;
    // and so on
}

对于您可以使用的阵列版本:

for (int i = 0; i < N; i++) {
    sum += POWER_OF_TWOS[4];
    sum += POWER_OF_TWOS[7];
    sum += POWER_OF_TWOS[1];
    // and so on
}

或者使用变量而不是常量和位掩码代替%:

for (int i = 0; i < N; i++) {
    final int idx = i & 0x3f;
    sum += 1L << idx;
}

for (int i = 0; i < N; i++) {
    final int idx = i & 0x3f;
    sum += POWER_OF_TWOS[idx];
}

或者如果优化器可能被欺骗:

for (int i = 0; i < N; i++) {
    final int idx = 0;
    sum += 1L << (idx + 4);
    sum += 1L << (idx + 17);
    // ...
}

无论如何,一个移位操作真的很快,并且在具有预计算值的数组上使用它应该是明智的。


2
投票

我决定写一个JMH基准来测试我的问题。

The Benchmark

import org.openjdk.jmh.Main;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.runner.RunnerException;

import java.io.IOException;
import java.util.stream.LongStream;

@Warmup(iterations = 3)
@Measurement(iterations = 3)
@Fork(value = 1)
public class LookupVsShift {
    private static final int NUM_SQUARES = 64;
    private static final long[] POWERS_OF_TWO = LongStream.range(0L, NUM_SQUARES)
                                                          .map(l -> 1L << l)
                                                          .toArray();
    private static final int N = 1_000_000;

    @Benchmark
    public long testLookupUnroll() {
        long sum = 0;

        for (int i = 0; i < N; i++) {
            sum += (i + POWERS_OF_TWO[58]) & 0x3f;
            sum += (i + POWERS_OF_TWO[23]) & 0x3f;
            sum += (i + POWERS_OF_TWO[55]) & 0x3f;
            sum += (i + POWERS_OF_TWO[56]) & 0x3f;
            sum += (i + POWERS_OF_TWO[52]) & 0x3f;
            sum += (i + POWERS_OF_TWO[38]) & 0x3f;
            sum += (i + POWERS_OF_TWO[49]) & 0x3f;
            sum += (i + POWERS_OF_TWO[36]) & 0x3f;
            sum += (i + POWERS_OF_TWO[9]) & 0x3f;
            sum += (i + POWERS_OF_TWO[7]) & 0x3f;
            sum += (i + POWERS_OF_TWO[19]) & 0x3f;
            sum += (i + POWERS_OF_TWO[54]) & 0x3f;
            sum += (i + POWERS_OF_TWO[37]) & 0x3f;
            sum += (i + POWERS_OF_TWO[4]) & 0x3f;
            sum += (i + POWERS_OF_TWO[35]) & 0x3f;
            sum += (i + POWERS_OF_TWO[8]) & 0x3f;
            sum += (i + POWERS_OF_TWO[40]) & 0x3f;
            sum += (i + POWERS_OF_TWO[33]) & 0x3f;
            sum += (i + POWERS_OF_TWO[43]) & 0x3f;
            sum += (i + POWERS_OF_TWO[13]) & 0x3f;
            sum += (i + POWERS_OF_TWO[14]) & 0x3f;
            sum += (i + POWERS_OF_TWO[3]) & 0x3f;
            sum += (i + POWERS_OF_TWO[20]) & 0x3f;
            sum += (i + POWERS_OF_TWO[63]) & 0x3f;
            sum += (i + POWERS_OF_TWO[29]) & 0x3f;
            sum += (i + POWERS_OF_TWO[18]) & 0x3f;
            sum += (i + POWERS_OF_TWO[45]) & 0x3f;
            sum += (i + POWERS_OF_TWO[22]) & 0x3f;
            sum += (i + POWERS_OF_TWO[57]) & 0x3f;
            sum += (i + POWERS_OF_TWO[26]) & 0x3f;
            sum += (i + POWERS_OF_TWO[24]) & 0x3f;
            sum += (i + POWERS_OF_TWO[10]) & 0x3f;
            sum += (i + POWERS_OF_TWO[16]) & 0x3f;
            sum += (i + POWERS_OF_TWO[15]) & 0x3f;
            sum += (i + POWERS_OF_TWO[46]) & 0x3f;
            sum += (i + POWERS_OF_TWO[32]) & 0x3f;
            sum += (i + POWERS_OF_TWO[17]) & 0x3f;
            sum += (i + POWERS_OF_TWO[48]) & 0x3f;
            sum += (i + POWERS_OF_TWO[41]) & 0x3f;
            sum += (i + POWERS_OF_TWO[39]) & 0x3f;
            sum += (i + POWERS_OF_TWO[12]) & 0x3f;
            sum += (i + POWERS_OF_TWO[51]) & 0x3f;
            sum += (i + POWERS_OF_TWO[21]) & 0x3f;
            sum += (i + POWERS_OF_TWO[0]) & 0x3f;
            sum += (i + POWERS_OF_TWO[50]) & 0x3f;
            sum += (i + POWERS_OF_TWO[44]) & 0x3f;
            sum += (i + POWERS_OF_TWO[2]) & 0x3f;
            sum += (i + POWERS_OF_TWO[60]) & 0x3f;
            sum += (i + POWERS_OF_TWO[34]) & 0x3f;
            sum += (i + POWERS_OF_TWO[31]) & 0x3f;
            sum += (i + POWERS_OF_TWO[30]) & 0x3f;
            sum += (i + POWERS_OF_TWO[53]) & 0x3f;
            sum += (i + POWERS_OF_TWO[61]) & 0x3f;
            sum += (i + POWERS_OF_TWO[1]) & 0x3f;
            sum += (i + POWERS_OF_TWO[27]) & 0x3f;
            sum += (i + POWERS_OF_TWO[62]) & 0x3f;
            sum += (i + POWERS_OF_TWO[25]) & 0x3f;
            sum += (i + POWERS_OF_TWO[28]) & 0x3f;
            sum += (i + POWERS_OF_TWO[11]) & 0x3f;
            sum += (i + POWERS_OF_TWO[5]) & 0x3f;
            sum += (i + POWERS_OF_TWO[6]) & 0x3f;
            sum += (i + POWERS_OF_TWO[42]) & 0x3f;
            sum += (i + POWERS_OF_TWO[59]) & 0x3f;
            sum += (i + POWERS_OF_TWO[47]) & 0x3f;
        }

        return sum;
    }

    @Benchmark
    public long testShiftUnroll() {
        long sum = 0;

        for (int i = 0; i < N; i++) {
            sum += 1L << (i + 35) & 0x3f;
            sum += 1L << (i + 52) & 0x3f;
            sum += 1L << (i + 55) & 0x3f;
            sum += 1L << (i + 57) & 0x3f;
            sum += 1L << (i + 38) & 0x3f;
            sum += 1L << (i + 13) & 0x3f;
            sum += 1L << (i + 36) & 0x3f;
            sum += 1L << (i + 19) & 0x3f;
            sum += 1L << (i + 7) & 0x3f;
            sum += 1L << (i + 48) & 0x3f;
            sum += 1L << (i + 8) & 0x3f;
            sum += 1L << (i + 0) & 0x3f;
            sum += 1L << (i + 45) & 0x3f;
            sum += 1L << (i + 2) & 0x3f;
            sum += 1L << (i + 14) & 0x3f;
            sum += 1L << (i + 44) & 0x3f;
            sum += 1L << (i + 31) & 0x3f;
            sum += 1L << (i + 6) & 0x3f;
            sum += 1L << (i + 25) & 0x3f;
            sum += 1L << (i + 18) & 0x3f;
            sum += 1L << (i + 34) & 0x3f;
            sum += 1L << (i + 41) & 0x3f;
            sum += 1L << (i + 37) & 0x3f;
            sum += 1L << (i + 32) & 0x3f;
            sum += 1L << (i + 1) & 0x3f;
            sum += 1L << (i + 53) & 0x3f;
            sum += 1L << (i + 9) & 0x3f;
            sum += 1L << (i + 16) & 0x3f;
            sum += 1L << (i + 62) & 0x3f;
            sum += 1L << (i + 4) & 0x3f;
            sum += 1L << (i + 12) & 0x3f;
            sum += 1L << (i + 46) & 0x3f;
            sum += 1L << (i + 17) & 0x3f;
            sum += 1L << (i + 29) & 0x3f;
            sum += 1L << (i + 63) & 0x3f;
            sum += 1L << (i + 51) & 0x3f;
            sum += 1L << (i + 21) & 0x3f;
            sum += 1L << (i + 24) & 0x3f;
            sum += 1L << (i + 49) & 0x3f;
            sum += 1L << (i + 40) & 0x3f;
            sum += 1L << (i + 58) & 0x3f;
            sum += 1L << (i + 59) & 0x3f;
            sum += 1L << (i + 33) & 0x3f;
            sum += 1L << (i + 61) & 0x3f;
            sum += 1L << (i + 56) & 0x3f;
            sum += 1L << (i + 42) & 0x3f;
            sum += 1L << (i + 5) & 0x3f;
            sum += 1L << (i + 23) & 0x3f;
            sum += 1L << (i + 22) & 0x3f;
            sum += 1L << (i + 43) & 0x3f;
            sum += 1L << (i + 60) & 0x3f;
            sum += 1L << (i + 15) & 0x3f;
            sum += 1L << (i + 11) & 0x3f;
            sum += 1L << (i + 27) & 0x3f;
            sum += 1L << (i + 30) & 0x3f;
            sum += 1L << (i + 54) & 0x3f;
            sum += 1L << (i + 10) & 0x3f;
            sum += 1L << (i + 3) & 0x3f;
            sum += 1L << (i + 50) & 0x3f;
            sum += 1L << (i + 28) & 0x3f;
            sum += 1L << (i + 47) & 0x3f;
            sum += 1L << (i + 20) & 0x3f;
            sum += 1L << (i + 26) & 0x3f;
            sum += 1L << (i + 39) & 0x3f;
        }

        return sum;
    }

    public static void main(final String[] args) throws IOException, RunnerException {
        Main.main(args);
    }
}

Results

# Run complete. Total time: 00:02:07

Benchmark                        Mode  Cnt   Score    Error  Units
LookupVsShift.testLookupUnroll  thrpt    3  23,072 ±  7,674  ops/s
LookupVsShift.testShiftUnroll   thrpt    3  20,834 ± 16,676  ops/s

Result

我觉得现在这个基准比较两者相对公平,并且开销不足以使差异可以忽略不计。从现在看来,查找似乎再快一点。我认为这只是由于Danny_ds所说的,很难为这种快速操作设计一个准确的基准......

© www.soinside.com 2019 - 2024. All rights reserved.