我正在尝试创建一个在字符串切片中移动的
'static
闭包,确切地说是 Vec<&str>
。
我最初的尝试是这样的:
fn main() {
let sentence = "Foo to the bar".to_string();
let my_closure = get_closure(sentence);
my_closure();
}
fn get_closure(sentence: String) -> impl Fn() {
let words: Vec<&str> = sentence.split_whitespace().collect();
move || {
// Since we move "sentence", the slices that is referred to are not valid anymore.
// If we don't use this we just get a lifetime error for sentence.
let _mover = &sentence;
for word in &words {
println!("{word}");
}
}
}
这个想法是移动
String
和切片,因此引用仍然有效。由于 String
只是对底层 str
的引用(并且由于我们没有对 String
进行任何突变,因此如果我理解正确的话,内部脂肪指针(?)不应更改或删除) ,我认为这会起作用,但是它抱怨 sentence
被移动了,即使只有底层 str
与我们相关。
我的下一个解决方案是使用
unsafe
并进行嬗变,以延长 words
的使用寿命,同时仍在弦中移动。
fn main() {
let sentence = "Foo to the bar".to_string();
let my_closure = get_closure(sentence);
my_closure();
}
fn get_closure(sentence: String) -> impl Fn() {
let words: Vec<&'static str> = unsafe {
std::mem::transmute(sentence.split_whitespace().collect::<Vec<&str>>())
};
move || {
// We move in sentence to extend its lifetime.
let _mover = &sentence;
for word in &words {
println!("{word}");
}
}
}
这按预期工作,没有任何问题,但我仍然有一些问题:
仅提供一点上下文,实际用例是在热点中,所以我想这样做的原因是避免每次调用时的分裂。实际的解决方案是移入
String
并在每次调用时将其拆分。
我编写了一个测试来查看每次调用时使用 unsafe 和 split 之间的区别。我知道在基准测试开始之前将分割
String
包含在 Vec<&str>
中在技术上是错误的,但是我怀疑分割输入一次是否会以任何显着的方式改变结果。随意看看:
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn get_closure_unsafe(sentence: String) -> impl Fn(usize) {
let words: Vec<&'static str> =
unsafe { std::mem::transmute(sentence.split_whitespace().collect::<Vec<&str>>()) };
move |_: usize| {
let _mover = &sentence;
for word in &words {
word.as_bytes().iter().sum::<u8>();
}
}
}
fn get_closure_safe(sentence: String) -> impl Fn(usize) {
move |_: usize| {
// We move in sentence to extend its lifetime.
for word in sentence.split_whitespace() {
word.as_bytes().iter().sum::<u8>();
}
}
}
fn bench_fibs(c: &mut Criterion) {
let mut group = c.benchmark_group("Str split");
let sentence = "Foo to the bar".to_string();
let safe_closure = get_closure_safe(sentence.clone());
group.bench_function("safe", |b| b.iter(|| safe_closure(black_box(1))));
let unsafe_closure = get_closure_unsafe(sentence);
group.bench_function("unsafe", |b| b.iter(|| unsafe_closure(black_box(1))));
group.finish();
}
criterion_group!(comparison, bench_fibs);
criterion_main!(comparison);
结果:
Str split/safe time: [15.707 ns 15.739 ns 15.779 ns]
change: [-0.2673% -0.0179% +0.2351%] (p = 0.89 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
Str split/unsafe time: [214.13 ps 214.37 ps 214.74 ps]
change: [-0.4720% -0.2975% -0.0848%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low severe
1 (1.00%) high mild
10 (10.00%) high severe
“因为
String
只是对底层 str
的引用”——它是一个引用,因为它存储了指向 str
的指针,但它在很大程度上是一个拥有的(更重要的是 Drop
)值,一旦放弃其底层str
就会被释放。您只需将 words
的声明移至闭包中即可,并摆脱其他“移动黑客”。
fn get_closure(sentence: String) -> impl Fn() {
move || {
let words: Vec<&str> = sentence.split_whitespace().collect();
for word in &words {
println!("{word}");
}
}
}
如果您想避免在每次调用时都创建
words
,只需参考 sentence
:
fn get_closure<'a>(sentence: &'a str) -> impl 'a + Fn() {
let words: Vec<&str> = sentence.split_whitespace().collect();
move || {
for word in &words {
println!("{word}");
}
}
}
这很棘手,因为如果您将一个对象及其借用对象都移动到闭包中,则意味着您的闭包对象需要自引用。
ouroboros
:
use ouroboros::self_referencing;
fn main() {
let sentence = "Foo to the bar".to_string();
let my_closure = get_closure(sentence);
my_closure();
}
#[self_referencing]
struct ClosureData {
pub sentence: String,
#[borrows(sentence)]
#[covariant]
pub words: Vec<&'this str>,
}
fn get_closure(sentence: String) -> impl Fn() {
let closure_data = ClosureDataBuilder {
sentence,
words_builder: |sentence| sentence.split_whitespace().collect(),
}.build();
// now you can move closure_data
move || {
let _mover = closure_data.borrow_sentence();
for word in closure_data.borrow_words() {
println!("{word}");
}
}
}