找到两个列表之间的相似性。通过根据列表中值的位置给出单独的权重

问题描述 投票:0回答:1

我试图在与另一个列表进行比较时找到列表的相似度值。就像找到一个句子的jaccard相似度值一样。但唯一的区别在于,如果两个列表中的值都在相同的索引中,那么它将获得静态权重,否则它的权重会根据它离该索引的位置而受到惩罚。

a=["are","you","are","you","why"]
b=['you',"are","you",'are',"why"]
li=[]
va=[]
fi=[]
weightOfStatic=1/len(a)
for i in range(len(a)):    
    if a[i]==b[i]:
    print("true1", weightOfStatic,a[i],b[i])
    fi.append({"static":i, "dynamic":i,"Weight":weightOfStatic})
    li.append([weightOfStatic,a[i],b[i]])
    va.append(li)
else:
     for j in range(len(b)):
         if a[i]==b[j]:
         weightOfDynamic = weightOfStatic*(1-(1/len(b))*abs(i-j))
         fi.append({"static":i, "dynamic":j,"Weight":weightOfDynamic})
         print("true2 and index diiference between words =%d"% abs(i-j),weightOfDynamic, i,j)
         li.append([weightOfDynamic,a[i],b[j]])
         va.append(weightOfDynamic)

sim_value=sum(va)
print("The similarity value is = %f" %(sim_value))

以下代码在没有重复单词时效果很好。 比如a = [“how”,“are”,“you”] b = [“你”,“是”,“怎么样”]。这里为了这个senetnce它给出0.5相似值

上述示例的预期结果将在列表A和B之间。如果列表A具有重复的单词,则列表A中的值应该在B中取最接近的索引。这是给出代码的aboe示例的匹配方式

      {'static': 0, 'dynamic': 1, 'Weight': 0.160}
 here 0 should not match with 3 again
      {'static': 0, 'dynamic': 3, 'Weight': 0.079}
      {'static': 1, 'dynamic': 0, 'Weight': 0.160}
 same for 1 and 2
      {'static': 1, 'dynamic': 2, 'Weight': 0.160}
 dynamic 1 is already overhere 
      {'static': 2, 'dynamic': 1, 'Weight': 0.160}
      {'static': 2, 'dynamic': 3, 'Weight': 0.160}
 dynamic 0 is already over
      {'static': 3, 'dynamic': 0, 'Weight': 0.079}
      {'static': 3, 'dynamic': 2, 'Weight': 0.160}
      [0.2, 'why', 'why'] 

这里的重量是1.3200(重量从0到1)

相反,结果应该是

      {'static': 0, 'dynamic': 1, 'Weight': 0.160}
      {'static': 1, 'dynamic': 0, 'Weight': 0.160}
      {'static': 2, 'dynamic': 3, 'Weight': 0.160}
      {'static': 3, 'dynamic': 2, 'Weight': 0.160}
      [0.2, 'why', 'why'] 

总重量为0.84

python
1个回答
1
投票

首先,我“美化”你的代码看起来更像Pythonic。 :)我觉得你过度复杂了一点。实际上,它甚至没有为我运行,因为你试图总结一个包含int和list的列表。

a = ['are','you','are','you','why']
b = ['you','are','you','are','why']

total_weight = 0
weight_of_static = 1/len(a)
for i, a_word in enumerate(a):
    if a_word == b[i]:
        print('{0} <-> {1} => static\t\t// weight: {2:.2f}'.format(a_word, b[i], weight_of_static))
        total_weight += weight_of_static
    else:
        distances = []
        for j, b_word in enumerate(b):
            if a_word == b_word:
                distances.append(abs(i - j))

        dynamic_weight = weight_of_static*(1 - ( 1 / len(b)) * min(distances))
        total_weight += dynamic_weight
        print('{0} <-> {1} => not static\t// weight: {2:.2f}'.format(a_word, b[i], dynamic_weight))

print('The similarity value is = {0:.2f}'.format(total_weight))
  • 所以首先我声明一个total_weight变量来跟踪权重。 然后我充分利用枚举函数,这样我就可以得到索引和元素。
  • 如果两个单词在同一个索引上相同则很简单:)
  • 如果没有,那么我们循环遍历第二个列表以及你做的但是我们必须跟踪距离变量中的匹配,因为a[3]将匹配b[0]而不是更接近的b[2]
  • 在那之后我们只使用你的公式计算动态重量(我留下了一点点冗长,所以你可以更清楚地看到它)。唯一的区别是我们使用最小距离(min(distance)

这是我的示例输出:

$ python similarity.py
are <-> you => not static       // weight: 0.16
you <-> are => not static       // weight: 0.16
are <-> you => not static       // weight: 0.16
you <-> are => not static       // weight: 0.16
why <-> why => static           // weight: 0.20
The similarity value is = 0.84   

我希望这有帮助。

© www.soinside.com 2019 - 2024. All rights reserved.