我可以按顺序使用这些因素的距离测量是什么?

问题描述 投票:0回答:1

我有几个列表,它们的 ID 都是字符串。它们如下:

list1 = ["1", "2", "3", "4", "5"]
list2 = ["1", "2", "3", "5", "4"]
list3 = ["1", "5", "4", "3", "2"]
list4 = ["4", "2", "5", "3", "1"]

我可以使用什么措施来确定此处顺序最接近的列表?理想情况下,

list1
list2
应该是这里最接近的。

斯皮尔曼相关性在这里有意义吗?

python distance levenshtein-distance
1个回答
0
投票

编辑距离似乎是此类指标的良好候选者。

from typing import List


def calcEditDistance(lhs: List[str], rhs: List[str]) -> int:
    m = len(lhs)
    n = len(rhs)
    # dp[i][j] := min # Of operations to convert lhs[0..i) to rhs[0..j)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(1, m + 1):
        dp[i][0] = i

    for j in range(1, n + 1):
        dp[0][j] = j

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if lhs[i - 1] == rhs[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]
            else:
                dp[i][j] = min(dp[i - 1][j - 1], dp[i - 1]
                               [j], dp[i][j - 1]) + 1

    return dp[m][n]


list1 = ["1", "2", "3", "4", "5"]
list2 = ["1", "2", "3", "5", "4"]
list3 = ["1", "5", "4", "3", "2"]
list4 = ["4", "2", "5", "3", "2"]

res = calcEditDistance(list1, list2)
print(f"dis[1, 2] = {res}")

res = calcEditDistance(list1, list3)
print(f"dis[1, 3] = {res}")

res = calcEditDistance(list1, list4)
print(f"dis[1, 4] = {res}")

res = calcEditDistance(list2, list3)
print(f"dis[2, 3] = {res}")

res = calcEditDistance(list2, list4)
print(f"dis[2, 4] = {res}")

res = calcEditDistance(list3, list4)
print(f"dis[3, 4] = {res}")

打印

dis[1, 2] = 2
dis[1, 3] = 4
dis[1, 4] = 4
dis[2, 3] = 4
dis[2, 4] = 4
dis[3, 4] = 3

这符合你的直觉。

© www.soinside.com 2019 - 2024. All rights reserved.