我正在使用递归函数使用正则表达式匹配来生成文本,它根据方括号 (
pattern = '\[.*?\]'
) 内的同义词组合查找单词模式,并用字符串分隔符分隔(我定义了 _STRING_SEPARATOR =#lkmkmksdmf###
. )
函数的初始句子参数类似于:
[decreasing#lkmkmksdmf###shrinking#lkmkmksdmf###falling#lkmkmksdmf###contracting#lkmkmksdmf###faltering#lkmkmksdmf###the contraction in] exports of services will drive national economy to a 0.3% real GDP [decline#lkmkmksdmf###decrease#lkmkmksdmf###contraction] in 2023 from an estimated 5.0% [decline#lkmkmksdmf###decrease#lkmkmksdmf###contraction] in 2022
和
该函数如下所示:
def combinations(self,sentence,sentence_list:list):
pattern = '\[.*?\]'
if not re.findall(pattern, sentence, flags = re.IGNORECASE):
if sentence not in sentence_list:
sentence_list.append(sentence)
else:
for single_match in re.finditer(pattern, sentence, flags = re.IGNORECASE):
repl=single_match.group(0)[1:-1]
start_span = single_match.span()[0]
end_span = single_match.span()[1]
for candidate_word in repl.split(self._STRING_SEPARATOR):
tmp_sentence = (
sentence[0: start_span] +
candidate_word +
sentence[end_span:]
)
new_sentence = deepcopy(tmp_sentence)
self.combinations(new_sentence,sentence_list)
因此,
sentence_list
变量像DFS树一样不断追加句子
我想避免两次使用相同的单词 - 例如,如果我使用了单词“decline”,那么在递归调用后在内部 for 循环中选择下一组单词时,不应再次使用它。当解析第二个方括号模式中的单词等时,是否有一种方法可以“存储”第一个方括号内的单词所使用的单词?
*它就像一个 DFS 树,其中每个节点都必须存储其每个父节点的状态。 * 如何修改该函数,以便在句子列表的单个句子中不再使用相同的单词?我尝试使用名为“avoid_words: list”的参数来存储父节点单词的列表。但是,当我必须从第一个方括号移到下一个单词(或从不同的“根”开始)时,如何删除它?
split()
函数将初始句子分成单词(同义词)和纯句子。波纹管是我将使用的注释代码,我是否必须解决这样的情况。
def all_combinations(sentence) -> list:
pattern = r'\[(.*?)\]'
synonyms = []
resulting_sentences = []
#Put all of the synonyms into synonyms list
list_of_synonyms = re.findall(pattern, sentence, flags = re.IGNORECASE)
#Remove synonyms from the origingal sentence
sentence = re.sub(pattern, '[]', sentence)
#split sinynonyms into dictionaries containing tuple and clock
for i, x in enumerate(list_of_synonyms):
synonyms.append(tuple(x.split('#lkmkmksdmf###')))
#Create combinations and put those into list of sets.
# Sets can hold only unique elements, thus in case of duplicity thwy will be shorter.
# The set will be removed if it's length is <3.
synonym_combinations = list(set(combinations) for combinations in itertools.product(*synonyms) if len(set(combinations)) == 3)
#iterate over combinations
for combination in synonym_combinations:
#iterate over words in combinations
formatted_sentence = sentence
for synonym in combination: formatted_sentence = formatted_sentence.replace('[]',synonym,1)
#append formatted sentence to resulting senteces
resulting_sentences.append(formatted_sentence)
return resulting_sentences