计算具有相邻转位且无插入或删除的序列的距离

问题描述 投票:0回答:1

在序列元素只能与相邻元素交换的要求下,如何计算两个序列的数值距离?不允许插入、删除。距离函数必须确定从一个字符串到达另一个字符串所需的交换操作的数量。此外,对于不同长度的序列,或者如果序列不共享相同的符号,则没有定义距离函数。最后,您可以假设序列中没有重复项。

根据我的研究,我需要像Damerau–Levenshtein距离这样的东西,但我不知道如何正确修改距离函数。

示例

对于序列

ABC
,距离为
1
的唯一两个有效变换是
BAC
ACB

代码

铁锈游乐场

use std::collections::HashSet;
use std::hash::Hash;

fn dist<T: PartialEq + Eq + Hash>(lhs: &[T], rhs: &[T]) -> Option<usize> {
    // Distance function is not defined for differing lengths
    if lhs.len() != rhs.len() {
        return None;
    }

    let lhs_c: HashSet<_> = lhs.iter().collect();
    let rhs_c: HashSet<_> = rhs.iter().collect();

    // Distance function is not defined if an element occurs more than once in each sequence
    if lhs.len() != lhs_c.len() || rhs.len() != rhs_c.len() {
        return None;
    }

    // Distance function is not defined if the sequences don't share all elements
    if lhs_c != rhs_c {
        return None;
    } 

    // Calculate the edit distance of `lhs` and `rhs` with only adjacent transpositions.
    todo!()
}

#[test]
fn dist_is_none_for_differing_lengths() {
    assert_eq!(None, dist(&["b", "a", "c"], &["a", "b", "c", "d"]));
}

#[test]
fn dist_is_none_with_duplicates() {
    assert_eq!(None, dist(&["b", "a", "c", "a"], &["a", "b", "c", "d"]));
    assert_eq!(None, dist(&["a", "b", "c", "d"], &["a", "b", "c", "c"]));
}

#[test]
fn dist_is_none_with_non_empty_difference() {
    assert_eq!(None, dist(&["a", "b", "c"], &["d", "e", "f"]));
}

#[test]
fn dist_bac_to_abc_is_one() {
    assert_eq!(Some(1), dist(&["b", "a", "c"], &["a", "b", "c"]));
}

#[test]
fn dist_bca_to_bac_is_one() {
    assert_eq!(Some(1), dist(&["b", "c", "a"], &["b", "a", "c"]));
}

#[test]
fn dist_cab_to_abc_is_two() {
    assert_eq!(Some(2), dist(&["c", "a", "b"], &["a", "b", "c"]));
}

#[test]
fn dist_cdab_to_abcd_is_four() {
    assert_eq!(Some(4), dist(&["c", "d", "a", "b"], &["a", "b", "c", "d"]));
}
rust edit-distance
1个回答
0
投票

您可以创建从元素到数字的映射,这样当它应用于其中一个序列时,它将被排序。然后您可以将此映射应用于其他序列。之后,您可以计算对第二个序列进行冒泡排序所需的交换次数:(plaground)

use std::collections::HashMap;
use core::hash::Hash;
use std::collections::HashSet;

fn create_mapping<T: Eq + Hash>(seq: &[T]) -> HashMap<&T, usize> {
    let mut ret = HashMap::new();
    for (i, elem) in seq.iter().enumerate() {
        ret.insert(elem, i);
    }
    ret
}

fn dist<T: Eq + Hash>(lhs: &[T], rhs: &[T]) -> Option<usize> {
    // Distance function is not defined for differing lengths
    if lhs.len() != rhs.len() {
        return None;
    }
    let lhs_c: HashSet<_> = lhs.iter().collect();
    let rhs_c: HashSet<_> = rhs.iter().collect();
    // Distance function is not defined if an element occurs more than once in each sequence
    if lhs.len() != lhs_c.len() || rhs.len() != rhs_c.len() {
        return None;
    }
    // Distance function is not defined if the sequences don't share all elements
    if lhs_c != rhs_c {
        return None;
    } 
    // Calculate the edit distance of `lhs` and `rhs` with only adjacent transpositions.
    let mapping = create_mapping(lhs);
    let rhs_mapped: Vec<usize> = rhs.iter().map(|e|mapping[e]).collect();
    Some(bubble_sort_length(&rhs_mapped))
}

fn bubble_sort_length(seq: &[usize]) -> usize {
    let mut ret = 0;
    for i in 0..seq.len() {
        for j in (i+1)..seq.len() {
            if seq[i]>seq[j] {
                ret += 1;
            }
        }
    }
    ret
}

#[test]
fn dist_is_none_for_differing_lengths() {
    assert_eq!(None, dist(&["b", "a", "c"], &["a", "b", "c", "d"]));
}

#[test]
fn dist_is_none_with_duplicates() {
    assert_eq!(None, dist(&["b", "a", "c", "a"], &["a", "b", "c", "d"]));
    assert_eq!(None, dist(&["a", "b", "c", "d"], &["a", "b", "c", "c"]));
}

#[test]
fn dist_is_none_with_non_empty_difference() {
    assert_eq!(None, dist(&["a", "b", "c"], &["d", "e", "f"]));
}

#[test]
fn dist_bac_to_abc_is_one() {
    assert_eq!(Some(1), dist(&["b", "a", "c"], &["a", "b", "c"]));
}

#[test]
fn dist_bca_to_bac_is_one() {
    assert_eq!(Some(1), dist(&["b", "c", "a"], &["b", "a", "c"]));
}

#[test]
fn dist_cab_to_abc_is_two() {
    assert_eq!(Some(2), dist(&["c", "a", "b"], &["a", "b", "c"]));
}

#[test]
fn dist_cdab_to_abcd_is_four() {
    assert_eq!(Some(4), dist(&["c", "d", "a", "b"], &["a", "b", "c", "d"]));
}

该算法的时间复杂度为

O(n²)
,其中
n
是序列的长度。可能有一些更好的方法来实现
bubble_sort_length
函数(参见 https://www.geeksforgeeks.org/inversion-count-in-array-using-merge-sort/ ,它声称具有时间复杂度
O(n*log(n))
)

© www.soinside.com 2019 - 2024. All rights reserved.