如何使用 Python 和 docx 将文本中找到的字典值替换为字典键

Question

我目前正在尝试使用 docx 重新格式化文档，方法是将文本中找到的字典值替换为相应的键。从值到键的直接交换不起作用，因为第一个实例必须是值（键）。此外，文本中稍后出现值（键）的任何其他实例都应仅转换为键。下面的例子：

字典：{km：公里}

'公里 (km) % 非常强大。巴拉巴拉巴拉巴拉。公里这个。公里(km)即km.', '公里称为km'

对此：

'公里 (km) % 非常强大。巴拉巴拉巴拉巴拉。公里这个。公里即公里。', '公里即公里称为公里'

我尝试了以下方法但没有成功。

for paragraph in doc.paragraphs:

    for key, value in acronyms.items():

        # Replace the value with the key in the paragraph text

        paragraph.text = paragraph.text.replace(value, key)

以及尝试此操作但没有成功：

from docx import Document
import re

# Define your acronym dictionary
acronyms = {
    'km': 'kilometer',
    # Add more acronyms as needed
}

print(acronyms[keys])
# Open the Word document
doc = Document("acronymnsTest.docx")

# Extract text from the document
document_text = ""

for paragraph in doc.paragraphs:
    document_text += paragraph.text + " "

# Split the document text into words
document_words = document_text.split()
document_words = ' '.join(document_words)

# Create a dictionary for first_used
first_used = {key: False for key in acronyms}

list1 = [document_words]
list2 = []

for qey, valves in acronyms.items():
    #print(valves)
    for inputText in list1:
        for key in first_used:
            if first_used[key] is False:
                replacement = "$%^#$$"
                if  valves + '(' + key + ')' in inputText:
                    inputText = re.sub(r'\b' + re.escape(valves) + r'(?=\s|%|\b)', replacement, inputText, count=1)
                    inputText = inputText.replace(' ' + valves + ' ', ' ' + key + ' ')
                    inputText = inputText.replace(' ' + valves + '.', ' ' + key + '.')
                    first_used[key] = True
                    print(inputText)
                    inputText = inputText.replace(replacement, valves)
                    list2.append(inputText)
                elif valves + ' ' + key + ' ' in inputText:
                    replacement = "$%^#$$"
                    inputText = re.sub(r'\b' + re.escape(valves + ' ' + key) + r'(?=\s|%|\b)', replacement, inputText, count=1)
                    inputText = inputText.replace(' ' + valves + ' ', ' ' + key + ' ')
                    inputText = inputText.replace(' ' + valves + '.', ' ' + key + '.')
                    inputText = inputText.replace(replacement, valves + ' (' + key + ')')
                    first_used[key] = True
                    list2.append(inputText)
                elif ' ' + key + ' ' in inputText:
                    replacement = "$%^#$$"
                    inputText = re.sub(r'\b' + re.escape(key) + r'\b', replacement, inputText, count=1)
                    inputText = inputText.replace(' ' + valves + ' ', ' ' + key + ' ')
                    inputText = inputText.replace(' ' + valves + '.', ' ' + key + '.')
                    inputText = inputText.replace(replacement, valves + ' (' + key + ')')
                    first_used[key] = True
                    list2.append(inputText)
                else:
                    pass
                    print("nope")
            else:
                inputText = inputText.replace(' ' + acronyms[key] + ' ', ' ' + key + ' ')
                list2.append(inputText)

# Print or use the modified text
for modified_text in list2:
    print(modified_text)

Answer 1

据我所知，

list1 = [document_words]

意味着

list1

是单个单词的列表。

我认为您还说您希望第一个出现的情况保持原样，但将所有剩余的出现的情况都包含在内。

您可以解析

list1

，并将所有（除了第一个）匹配的单词替换为首字母缩略词，即使用

set

而不是 Python 中的字典来跟踪已经匹配的单词。

# Change first_used to use value, instead of key.
first_used = {value: False for value in acronyms.values()}

for inputText in list1:
    # Word has an abbreviation sub
    if inputText in first_used:
        if first_used[inputText] is True:
            # Do your replacements accordingly <ADD CODE>.
        else:
            # Note down first occurrence has been met <ADD CODE>.
            first_used[inputText] = True
    # Word has no abbreviation sub
    else:
        # Nothing to sub <ADD CODE>.

如何使用 Python 和 docx 将文本中找到的字典值替换为字典键

问题描述投票：0回答：1

1个回答

最新问题

如何使用 Python 和 docx 将文本中找到的字典值替换为字典键

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1