根据给定的字母配额和wordlist.txt构建可能的句子列表

问题描述 投票:0回答:1

我有

wordlist.txt
,由换行符分隔。

例如,如果我指定每个字母表使用的配额数量

n: 1
e: 1
w: 1
b: 1
o: 2
k: 1
Remain alphabets quota is 0.

如何根据 wordlist.txt 中定义的单词,从给定的字母表配额(必须全部用完直到零)构建一个句子?

例如,根据给定的字母配额,它将返回“新书”或“新书”。词序并不重要。

“新”和“书”已经存在于

wordlist.txt

所以可能的句子列表可能是这样的:

new book
book new
bow neko
neko bow
python text anagram
1个回答
0
投票

假设

wordlist.txt
中没有几个新单词来处理多个字谜:

bow
book
new
neko
ujang
wen
koob

如果

book
koob
排序,两者将具有相同的值,即
bkoo
。相同值的单词被认为是
anagram_id

不用使用定义的配额,我可以直接写代表字母配额的字符串,因为排序时,它会是相同的。

from itertools import combinations, product
from collections import OrderedDict

def generate_anagrams(input_sentence='koob ewn', wordlist='wordlist.txt'):
  input_sentence = filterOrigin(input_sentence)

  with open('wordlist.txt', 'r') as file:
    wordlist = file.read().splitlines()

  anagram_id = []
  for word in wordlist:
    anagram_id.append(''.join(sorted(word))) # sorted word is anagram id

  sorted_input_sentence = ''.join(sorted(filterOrigin(input_sentence)))

  all_anagram_id = []
  for i in range(1, len(input_sentence)+1):
    combs = combinations(input_sentence, i)
    all_anagram_id += [''.join(sorted(comb)) for comb in combs]

  all_registered_anagram_id = []
  for id_from_input in all_anagram_id:
    for id_from_wordlist in anagram_id:
      if id_from_input == id_from_wordlist:
        all_registered_anagram_id.append(id_from_wordlist)

  all_registered_anagram_values = dict()
  for id in all_registered_anagram_id:
    all_registered_anagram_values[id] = ([wordlist[i] for i, x in enumerate(anagram_id) if x == id])

  sentence_combs = []
  for l in range(1, len(all_registered_anagram_id)+1):
    sentence_combs.append(set(combinations(all_registered_anagram_id, l)))

  valid_sentences_id = []
  for comb in sentence_combs:
    for pair in comb:
      candidate = ''.join(pair)
      if sorted(input_sentence) == sorted(candidate): # is anagram?
        valid_sentences_id.append(pair)
  
  valid_sentences = []
  for sid in valid_sentences_id:
    broadcasted = []
    for id in sid:
      broadcasted.append(all_registered_anagram_values[id])
    for sentence in list(product(*broadcasted)):
      valid_sentences.append(' '.join(sentence))
  
  return valid_sentences

generate_anagrams()

返回输出:

['new book', 'new koob', 'wen book', 'wen koob', 'bow neko']
© www.soinside.com 2019 - 2024. All rights reserved.