在序列元素只能与相邻元素交换的要求下,如何计算两个序列的数值距离?不允许插入、删除。距离函数必须确定从一个字符串到达另一个字符串所需的交换操作的数量。此外,对于不同长度的序列,或者如果序列不共享相同的符号,则没有定义距离函数。最后,您可以假设序列中没有重复项。
根据我的研究,我需要像Damerau–Levenshtein距离这样的东西,但我不知道如何正确修改距离函数。
对于序列
ABC
,距离为 1
的唯一两个有效变换是 BAC
或 ACB
。
use std::collections::HashSet;
use std::hash::Hash;
fn dist<T: PartialEq + Eq + Hash>(lhs: &[T], rhs: &[T]) -> Option<usize> {
// Distance function is not defined for differing lengths
if lhs.len() != rhs.len() {
return None;
}
let lhs_c: HashSet<_> = lhs.iter().collect();
let rhs_c: HashSet<_> = rhs.iter().collect();
// Distance function is not defined if an element occurs more than once in each sequence
if lhs.len() != lhs_c.len() || rhs.len() != rhs_c.len() {
return None;
}
// Distance function is not defined if the sequences don't share all elements
if lhs_c != rhs_c {
return None;
}
// Calculate the edit distance of `lhs` and `rhs` with only adjacent transpositions.
todo!()
}
#[test]
fn dist_is_none_for_differing_lengths() {
assert_eq!(None, dist(&["b", "a", "c"], &["a", "b", "c", "d"]));
}
#[test]
fn dist_is_none_with_duplicates() {
assert_eq!(None, dist(&["b", "a", "c", "a"], &["a", "b", "c", "d"]));
assert_eq!(None, dist(&["a", "b", "c", "d"], &["a", "b", "c", "c"]));
}
#[test]
fn dist_is_none_with_non_empty_difference() {
assert_eq!(None, dist(&["a", "b", "c"], &["d", "e", "f"]));
}
#[test]
fn dist_bac_to_abc_is_one() {
assert_eq!(Some(1), dist(&["b", "a", "c"], &["a", "b", "c"]));
}
#[test]
fn dist_bca_to_bac_is_one() {
assert_eq!(Some(1), dist(&["b", "c", "a"], &["b", "a", "c"]));
}
#[test]
fn dist_cab_to_abc_is_two() {
assert_eq!(Some(2), dist(&["c", "a", "b"], &["a", "b", "c"]));
}
#[test]
fn dist_cdab_to_abcd_is_four() {
assert_eq!(Some(4), dist(&["c", "d", "a", "b"], &["a", "b", "c", "d"]));
}
您可以创建从元素到数字的映射,这样当它应用于其中一个序列时,它将被排序。然后您可以将此映射应用于其他序列。之后,您可以计算对第二个序列进行冒泡排序所需的交换次数:(plaground)
use std::collections::HashMap;
use core::hash::Hash;
use std::collections::HashSet;
fn create_mapping<T: Eq + Hash>(seq: &[T]) -> HashMap<&T, usize> {
let mut ret = HashMap::new();
for (i, elem) in seq.iter().enumerate() {
ret.insert(elem, i);
}
ret
}
fn dist<T: Eq + Hash>(lhs: &[T], rhs: &[T]) -> Option<usize> {
// Distance function is not defined for differing lengths
if lhs.len() != rhs.len() {
return None;
}
let lhs_c: HashSet<_> = lhs.iter().collect();
let rhs_c: HashSet<_> = rhs.iter().collect();
// Distance function is not defined if an element occurs more than once in each sequence
if lhs.len() != lhs_c.len() || rhs.len() != rhs_c.len() {
return None;
}
// Distance function is not defined if the sequences don't share all elements
if lhs_c != rhs_c {
return None;
}
// Calculate the edit distance of `lhs` and `rhs` with only adjacent transpositions.
let mapping = create_mapping(lhs);
let rhs_mapped: Vec<usize> = rhs.iter().map(|e|mapping[e]).collect();
Some(bubble_sort_length(&rhs_mapped))
}
fn bubble_sort_length(seq: &[usize]) -> usize {
let mut ret = 0;
for i in 0..seq.len() {
for j in (i+1)..seq.len() {
if seq[i]>seq[j] {
ret += 1;
}
}
}
ret
}
#[test]
fn dist_is_none_for_differing_lengths() {
assert_eq!(None, dist(&["b", "a", "c"], &["a", "b", "c", "d"]));
}
#[test]
fn dist_is_none_with_duplicates() {
assert_eq!(None, dist(&["b", "a", "c", "a"], &["a", "b", "c", "d"]));
assert_eq!(None, dist(&["a", "b", "c", "d"], &["a", "b", "c", "c"]));
}
#[test]
fn dist_is_none_with_non_empty_difference() {
assert_eq!(None, dist(&["a", "b", "c"], &["d", "e", "f"]));
}
#[test]
fn dist_bac_to_abc_is_one() {
assert_eq!(Some(1), dist(&["b", "a", "c"], &["a", "b", "c"]));
}
#[test]
fn dist_bca_to_bac_is_one() {
assert_eq!(Some(1), dist(&["b", "c", "a"], &["b", "a", "c"]));
}
#[test]
fn dist_cab_to_abc_is_two() {
assert_eq!(Some(2), dist(&["c", "a", "b"], &["a", "b", "c"]));
}
#[test]
fn dist_cdab_to_abcd_is_four() {
assert_eq!(Some(4), dist(&["c", "d", "a", "b"], &["a", "b", "c", "d"]));
}
该算法的时间复杂度为
O(n²)
,其中 n
是序列的长度。可能有一些更好的方法来实现 bubble_sort_length
函数(参见 https://www.geeksforgeeks.org/inversion-count-in-array-using-merge-sort/ ,它声称具有时间复杂度 O(n*log(n))
)