我有几个列表,它们的 ID 都是字符串。它们如下:
list1 = ["1", "2", "3", "4", "5"]
list2 = ["1", "2", "3", "5", "4"]
list3 = ["1", "5", "4", "3", "2"]
list4 = ["4", "2", "5", "3", "1"]
我可以使用什么措施来确定此处顺序最接近的列表?理想情况下,
list1
和 list2
应该是这里最接近的。
斯皮尔曼相关性在这里有意义吗?
编辑距离似乎是此类指标的良好候选者。
from typing import List
def calcEditDistance(lhs: List[str], rhs: List[str]) -> int:
m = len(lhs)
n = len(rhs)
# dp[i][j] := min # Of operations to convert lhs[0..i) to rhs[0..j)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
dp[i][0] = i
for j in range(1, n + 1):
dp[0][j] = j
for i in range(1, m + 1):
for j in range(1, n + 1):
if lhs[i - 1] == rhs[j - 1]:
dp[i][j] = dp[i - 1][j - 1]
else:
dp[i][j] = min(dp[i - 1][j - 1], dp[i - 1]
[j], dp[i][j - 1]) + 1
return dp[m][n]
list1 = ["1", "2", "3", "4", "5"]
list2 = ["1", "2", "3", "5", "4"]
list3 = ["1", "5", "4", "3", "2"]
list4 = ["4", "2", "5", "3", "2"]
res = calcEditDistance(list1, list2)
print(f"dis[1, 2] = {res}")
res = calcEditDistance(list1, list3)
print(f"dis[1, 3] = {res}")
res = calcEditDistance(list1, list4)
print(f"dis[1, 4] = {res}")
res = calcEditDistance(list2, list3)
print(f"dis[2, 3] = {res}")
res = calcEditDistance(list2, list4)
print(f"dis[2, 4] = {res}")
res = calcEditDistance(list3, list4)
print(f"dis[3, 4] = {res}")
打印
dis[1, 2] = 2
dis[1, 3] = 4
dis[1, 4] = 4
dis[2, 3] = 4
dis[2, 4] = 4
dis[3, 4] = 3
这符合你的直觉。