尝试找到将切片组合成数据库键的最佳方法。
我需要一个包含三件事(连接)的切片:
我尝试过对以下 3 个函数进行基准测试。我本来期望“slot_key_array”是最快的,因为它没有分配向量,但“slot_key_sized”始终是最快的。为什么?
#[inline]
pub fn slot_key_vec(address: &[u8], slot: &[u8]) -> Vec<u8> {
let mut key: Vec<u8> = vec![DB_PREFIX_SLOT];
key.extend_from_slice(address);
key.extend_from_slice(slot);
key
}
#[inline]
pub fn slot_key_sized(address: &[u8], slot: &[u8]) -> Vec<u8> {
let mut key: Vec<u8> = Vec::with_capacity(1 + 20 + 32);
key.push(DB_PREFIX_SLOT);
key.extend_from_slice(address);
key.extend_from_slice(slot);
key
}
#[inline]
pub fn slot_key_array(address: &[u8], slot: &[u8]) -> [u8; 53] {
let key: [u8; 53] = [
DB_PREFIX_SLOT,
address[0], address[1], address[2], address[3], address[4], address[5], address[6], address[7],
address[8], address[9], address[10], address[11], address[12], address[13], address[14], address[15],
address[16], address[17], address[18], address[19],
slot[0], slot[1], slot[2], slot[3], slot[4], slot[5], slot[6], slot[7],
slot[8], slot[9], slot[10], slot[11], slot[12], slot[13], slot[14], slot[15],
slot[16], slot[17], slot[18], slot[19], slot[20], slot[21], slot[22], slot[23],
slot[24], slot[25], slot[26], slot[27], slot[28], slot[29], slot[30], slot[31],
];
key
}
您可能使用的是 Linux,其中默认分配器很好(我在 Windows 上进行了基准测试,通过将分配器替换为
mimalloc
我得到了相同的结果)。
这是因为数组版本需要对每个元素进行边界检查,这会阻止矢量化并增加开销。但他们不必这样做:如果我们使用参数数组而不是切片,边界检查就会消失。
pub fn slot_key_array_arrays(address: &[u8; 20], slot: &[u8; 32]) -> [u8; 53] {
// ...
}
copy_from_slice()
:
#[inline]
pub fn slot_key_copy_from_slice(address: &[u8], slot: &[u8]) -> [u8; 53] {
let mut key: [u8; 53] = [0; 53];
key[0] = DB_PREFIX_SLOT;
key[1..][..20].copy_from_slice(address);
key[21..].copy_from_slice(slot);
key
}
基准:
copy_from_slice time: [6.7784 ns 6.7921 ns 6.8103 ns]
Found 15 outliers among 100 measurements (15.00%)
7 (7.00%) high mild
8 (8.00%) high severe
array_arrays time: [4.4975 ns 4.5019 ns 4.5073 ns]
Found 14 outliers among 100 measurements (14.00%)
6 (6.00%) high mild
8 (8.00%) high severe
array time: [13.063 ns 13.107 ns 13.165 ns]
Found 16 outliers among 100 measurements (16.00%)
4 (4.00%) high mild
12 (12.00%) high severe
sized time: [10.228 ns 10.251 ns 10.278 ns]
Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) high mild
10 (10.00%) high severe
sized #2 time: [24.553 ns 24.586 ns 24.622 ns]
Found 18 outliers among 100 measurements (18.00%)
4 (4.00%) low severe
5 (5.00%) low mild
5 (5.00%) high mild
4 (4.00%) high severe
因此,使用数组而不是切片的版本是最快的,然后是
copy_from_slice()
,然后是其他版本。