如何在Python中突出显示两个字符串之间的差异？

Question

我想使用 Python 代码以颜色突出显示两个字符串之间的差异。

示例1：

sentence1 = "I'm enjoying the summer breeze on the beach while I do some pilates."
sentence2 = "I am enjoying the summer breeze on the beach while I am doing some pilates."

预期结果（星号部分为红色）：

 I *am* enjoying the summer breeze on the beach while I *am doing* some pilates.

示例2：

sentence1: "My favourite season is Autumn while my sister's favourite season is Winter."
sentence2: "My favourite season is Autumn, while my sister's favourite season is Winter."

预期结果（逗号不同）：

"My favourite season is Autumn*,* while my sister's favourite season is Winter."

我试过这个：

sentence1 = "I'm enjoying the summer breeze on the beach while I do some pilates."
sentence2 = "I'm enjoying the summer breeze on the beach while I am doing some pilates."

# Split the sentences into words
words1 = sentence1.split()
words2 = sentence2.split()

# Find the index where the sentences differ
index_of_difference = next((i for i, (word1, word2) in enumerate(zip(words1, words2)) if word1 != word2), None)

# Highlight differing part "am doing" in red
highlighted_words = []
for i, (word1, word2) in enumerate(zip(words1, words2)):
    if i == index_of_difference:
        highlighted_words.append('\033[91m' + word2 + '\033[0m')
    else:
        highlighted_words.append(word2)

highlighted_sentence = ' '.join(highlighted_words)
print(highlighted_sentence)

我得到了这个：

I'm enjoying the summer breeze on the beach while I *am* doing some

而不是这个：

I'm enjoying the summer breeze on the beach while I *am doing* some pilates.

我该如何解决这个问题？

Answer 1

我建议使用

difflib

matching_blocks

方法将匹配的子字符串与其余子字符串隔离。

这里我做了一个例子，我从头开始重建单词，但只要子字符串位于匹配之外，就使用

。

代码：

import difflib as dl


def line_builder(blocks, sentence, a_or_b):
    new_sentence = ""
    position = 0
    for block in blocks:
        match_start = getattr(block, a_or_b)
        if block.size == 0:
            continue
        if match_start > position:
            new_sentence += f"*{sentence[position:match_start]}*"
        new_sentence += sentence[match_start: match_start + block.size]
        position = match_start + block.size
    return new_sentence


def print_diffs(a, b):
    s = dl.SequenceMatcher(a=a, b=b)
    m = s.get_matching_blocks()
    new_a = line_builder(m, a, "a")
    new_b = line_builder(m, b, "b")
    print(f"Here are the differences\n\t{new_a}\n\t{new_b}")

实际效果如下：

$ python -i text_diffs.py 
>>> print_diffs("This is the most fun I have ever had", "This was the most fun I could have ever had")
Here are the differences
    This *i*s the most fun I have ever had
    This *wa*s the most fun I *could *have ever had
>>>

保持中间这可能不适用于所有示例，它写得很快，可能仍然需要针对边缘情况进行一些微调。

Answer 2

使用

difflib

获取匹配块：

from difflib import SequenceMatcher

s1 = "I'm enjoying the summer breeze on the beach while I do some pilates."
s2 = "I'm enjoying the summer breeze on the beach while I am doing some pilates."

x = SequenceMatcher(None, s1, s2)
m = x.get_matching_blocks()

[出]：

[Match(a=0, b=0, size=52),
 Match(a=52, b=55, size=2),
 Match(a=54, b=60, size=14),
 Match(a=68, b=74, size=0)]

然后，使用颜色字符串将颜色放在子字符串上：


s2_new = ""
for m in x.get_matching_blocks():
    if m.b > i:
        s2_new += s2[i:m.b]
    s2_new += f"\033[91m{s2[m.b:m.b+m.size]}\033[0m"
    i = m.b + m.size
    
print(s2_new)

[出]：

\x1b[91mI'm enjoying the summer breeze on the beach while I \x1b[0mam \x1b[91mdo\x1b[0ming\x1b[91m some pilates.\x1b[0m\x1b[91m\x1b[0m

或者，如果您想要比

get_matching_blocks()

更小的粒度，请尝试：

from difflib import SequenceMatcher

s1 = "I'm enjoying the summer breeze on the beach while I do some pilates."
s2 = "I'm enjoying the summer breeze on the beach while I am doing some pilates."

x = SequenceMatcher(None, s1, s2)

matches = []
a, b = 0, 0
while True:
    m = x.find_longest_match(alo=a, ahi=len(s1), blo=b, bhi=len(s2))
    a, b = m.a + m.size, m.b + m.size
    if m.size == 0:
        break
    else:
        matches.append(m)
        
print(matches)

[出]：

[Match(a=0, b=0, size=52), Match(a=54, b=60, size=14)]

如何在Python中突出显示两个字符串之间的差异？

问题描述投票：0回答：2

2个回答

代码：

最新问题

如何在Python中突出显示两个字符串之间的差异？

问题描述 投票：0回答：2

2个回答

代码：

最新问题

问题描述投票：0回答：2