请考虑以下示例:
示例1:
str1 = "wow...it looks amazing"
str2 = "looks amazi"
[您看到amazi
接近amazing
,输入了错误的str2
,我想编写一个程序来告诉我amazi
接近amazing
,然后在str2
中我将替换[ C0]和amazi
示例2:
amazing
在这种情况下,更新的str1 = "is looking good"
str2 = "looks goo"
将为str2
示例3:
"looking good"
在这种情况下,str1 = "you are really looking good"
str2 = "lok goo"
将为str2
,因为"good"
不接近lok
(或者即使程序可以在这种情况下将looking
转换为lok
,也可以解决我的问题)
示例4:
looking
更新后的str1 = "Stu is actually SEVERLY sunburnt....it hurts!!!"
str2 = "hurts!!"
将为str2
示例5:
"hurts!!!"
更新后的str1 = "you guys were absolutely amazing tonight, a..."
str2 = "ly amazin"
将为str2
,"amazing"
应被删除或绝对替换。
这将是什么算法和代码?
也许我们可以通过按字典顺序查看字符并设置一个阈值,例如0.8或80%,因此"ly"
从word2
获得word1
的80%连续字符,然后用str1
的单词替换word2
中的str2
?还有其他使用python代码的有效解决方案吗?
有很多方法可以解决这个问题。这个解决了所有示例。我添加了一个最小相似度过滤器,以仅返回更高质量的匹配项。这就是允许在最后一个示例中删除“ ly”的原因,因为并不是所有的单词都闭合。
str1
您可以使用Documentation安装levenshtein
pip install python-Levenshtein
您提出的每个示例。
import Levenshtein
def find_match(str1,str2):
min_similarity = .75
output = []
results = [[Levenshtein.jaro_winkler(x,y) for x in str1.split()] for y in str2.split()]
for x in results:
if max(x) >= min_similarity:
output.append(str1.split()[x.index(max(x))])
return output
我使用正则表达式完成了它
find_match("is looking good", "looks goo")
['looking','good']
find_match("you are really looking good", "lok goo")
['looking','good']
find_match("Stu is actually SEVERLY sunburnt....it hurts!!!", "hurts!!")
['hurts!!!']
find_match("you guys were absolutely amazing tonight, a...", "ly amazin")
['amazing']
喜欢这个:
def check_regex(str1,str2):
#New list to store the updated value
str_new = []
for i in str2:
# regular expression for comparing the strings
x = ['['+i+']','^'+i,i+'$','('+i+')']
for k in x:
h=0
for j in str1:
#Conditions to make sure the word is close enough to the particular word
if "".join(re.findall(k,j)) == i or ("".join(re.findall(k,j)) in i and abs(len("".join(re.findall(k,j)))-len(i)) == 1 and len(i)!=2):
str_new.append(j)
h=1
break
if h==1:
break
return str_new
import re
str1 = input().split()
str2 = input().split()
print(" ".join(check_regex(str1,str2)))
输出:
str1 = "wow...it looks amazing"
str2 = "looks amazi"
str3 = []
# Checking for similar strings in both strings:
for n in str1.split():
for m in str2.split():
if m in n:
str3.append(n)
# If found 2 similar strings:
if len(str3) == 2:
# If their indexes align:
if str1.split().index(str3[1]) - str1.split().index(str3[0]) == 1:
print(' '.join(str3))
elif len(str3) == 1:
print(str3[0])
根据OP给出的条件进行更新:
looks amazing
在这种情况下,您可以使用Jacard系数。首先,您需要按空格分割第一和第二个字符串。然后,对于str2中的每个字符串,将Jacard系数与str1中的每个字符串相乘,然后替换为具有最高Jacard系数的字符串。
您可以使用str1 = "good..."
str2 = "god.."
str3 = []
# Checking for similar strings in both strings:
for n in str1.split():
for m in str2.split():
# Calculating matching character in the 2 words:
c = ''
for i in m:
if i in n:
c+=i
# If the amount of matching characters is greater or equal to 50% the length of the larger word
# or the smaller word is in the larger word:
if len(list(c)) >= len(n)*0.50 or m in n:
str3.append(n)
# If found 2 similar strings:
if len(str3) == 2:
# If their indexes align:
if str1.split().index(str3[1]) - str1.split().index(str3[0]) == 1:
print(' '.join(str3))
elif len(str3) == 1:
print(str3[0])
。