什么是检查当前单词是否接近字符串中单词的有效方法？

Question

请考虑以下示例：

示例1：
```
str1 = "wow...it  looks amazing"
str2 = "looks amazi"
```
[您看到amazi接近amazing，输入了错误的str2，我想编写一个程序来告诉我amazi接近amazing，然后在str2中我将替换[ C0]和amazi
示例2：
```
amazing
```
在这种情况下，更新的str1 = "is looking good" str2 = "looks goo"将为str2
示例3：
```
"looking good"
```
在这种情况下，str1 = "you are really looking good" str2 = "lok goo"将为str2，因为"good"不接近lok（或者即使程序可以在这种情况下将looking转换为lok，也可以解决我的问题）
示例4：
```
looking
```
更新后的str1 = "Stu is actually SEVERLY sunburnt....it hurts!!!" str2 = "hurts!!"将为str2
示例5：
```
"hurts!!!"
```
更新后的str1 = "you guys were absolutely amazing tonight, a..." str2 = "ly amazin"将为str2，"amazing"应被删除或绝对替换。

这将是什么算法和代码？

也许我们可以通过按字典顺序查看字符并设置一个阈值，例如0.8或80％，因此"ly"从word2获得word1的80％连续字符，然后用str1的单词替换word2中的str2？还有其他使用python代码的有效解决方案吗？

Answer 1

有很多方法可以解决这个问题。这个解决了所有示例。我添加了一个最小相似度过滤器，以仅返回更高质量的匹配项。这就是允许在最后一个示例中删除“ ly”的原因，因为并不是所有的单词都闭合。

str1

您可以使用Documentation安装levenshtein

pip install python-Levenshtein

您提出的每个示例。

import Levenshtein

def find_match(str1,str2):
    min_similarity = .75
    output = []
    results = [[Levenshtein.jaro_winkler(x,y) for x in str1.split()] for y in str2.split()]
    for x in results:
        if max(x) >= min_similarity:
            output.append(str1.split()[x.index(max(x))])
    return output

Answer 2

我使用正则表达式完成了它

find_match("is looking good", "looks goo")

['looking','good']

find_match("you are really looking good", "lok goo")

['looking','good']

find_match("Stu is actually SEVERLY sunburnt....it hurts!!!", "hurts!!")

['hurts!!!']

find_match("you guys were absolutely amazing tonight, a...", "ly amazin")

['amazing']

Answer 3

喜欢这个：

def check_regex(str1,str2):
    #New list to store the updated value
    str_new = []
    for i in str2:
        # regular expression for comparing the strings
        x = ['['+i+']','^'+i,i+'$','('+i+')']
        for k in x:
            h=0
            for j in str1:
                #Conditions to make sure the word is close enough to the particular word
                if "".join(re.findall(k,j)) == i or ("".join(re.findall(k,j)) in i and abs(len("".join(re.findall(k,j)))-len(i)) == 1 and len(i)!=2):
                    str_new.append(j)
                    h=1
                    break
            if h==1:
                break
    return str_new
import re
str1 = input().split()
str2 = input().split()
print(" ".join(check_regex(str1,str2)))

输出：

str1 = "wow...it looks amazing"
str2 =  "looks amazi"
str3 = []

# Checking for similar strings in both strings:
for n in str1.split():
    for m in str2.split():
        if m in n:
            str3.append(n)

# If found 2 similar strings:
if len(str3) == 2:
    # If their indexes align:
    if str1.split().index(str3[1]) - str1.split().index(str3[0]) == 1:
        print(' '.join(str3))

elif len(str3) == 1:
    print(str3[0])

根据OP给出的条件进行更新：

looks amazing

Answer 4

在这种情况下，您可以使用Jacard系数。首先，您需要按空格分割第一和第二个字符串。然后，对于str2中的每个字符串，将Jacard系数与str1中的每个字符串相乘，然后替换为具有最高Jacard系数的字符串。

您可以使用str1 = "good..." str2 = "god.." str3 = [] # Checking for similar strings in both strings: for n in str1.split(): for m in str2.split(): # Calculating matching character in the 2 words: c = '' for i in m: if i in n: c+=i # If the amount of matching characters is greater or equal to 50% the length of the larger word # or the smaller word is in the larger word: if len(list(c)) >= len(n)*0.50 or m in n: str3.append(n) # If found 2 similar strings: if len(str3) == 2: # If their indexes align: if str1.split().index(str3[1]) - str1.split().index(str3[0]) == 1: print(' '.join(str3)) elif len(str3) == 1: print(str3[0])。

什么是检查当前单词是否接近字符串中单词的有效方法？

问题描述投票：1回答：4

4个回答

最新问题

什么是检查当前单词是否接近字符串中单词的有效方法？

问题描述 投票：1回答：4

4个回答

最新问题

问题描述投票：1回答：4