我想根据给定的字符串在名为
dictionary.txt
的文件中找到成对的单词
不同的字母,例如GRIHWSNYP
.
dictionary.txt
看起来像这样:
AARHUS
AARON
ABABA
ABACK
ABAFT
ABANDON
ABANDONED
ABANDONING
ABANDONMENT
ABANDONS
ABASE
ABASED
...
预期输出的一些例子:
>>> f('ABCDEFGH')
There is no solution.
>>> f('GRIHWSNYP')
The pairs of words using all (distinct) letters in "GRIHWSNYP" are:
('SPRING', 'WHY')
>>> f('ONESIX')
The pairs of words using all (distinct) letters in "ONESIX" are:
('ION', 'SEX')
('ONE', 'SIX')
>>> f('UTAROFSMN')
The pairs of words using all (distinct) letters in "UTAROFSMN" are:
('AFT', 'MOURNS')
('ANT', 'FORUMS')
('ANTS', 'FORUM')
('ARM', 'FOUNTS')
('ARMS', 'FOUNT')
('AUNT', 'FORMS')
('AUNTS', 'FORM')
('AUNTS', 'FROM')
('FAN', 'TUMORS')
('FANS', 'TUMOR')
('FAR', 'MOUNTS')
('FARM', 'SNOUT')
('FARMS', 'UNTO')
('FAST', 'MOURN')
('FAT', 'MOURNS')
('FATS', 'MOURN')
('FAUN', 'STORM')
('FAUN', 'STROM')
('FAUST', 'MORN')
('FAUST', 'NORM')
('FOAM', 'TURNS')
('FOAMS', 'RUNT')
('FOAMS', 'TURN')
('FORMAT', 'SUN')
('FORUM', 'STAN')
('FORUMS', 'NAT')
('FORUMS', 'TAN')
('FOUNT', 'MARS')
('FOUNT', 'RAMS')
('FOUNTS', 'RAM')
('FUR', 'MATSON')
('MASON', 'TURF')
('MOANS', 'TURF')
我试过:
dictionary = 'dictionary.txt'
solutions = []
clean_dictionary = [i for i in dictionary if len(i)== len(set(i))] # Filter out words without repeated letters.
for word in clean_dictionary:
if not set(word) - set(letters):
first_word = word
second_letters = set(letters) - set(word)
for second in clean_dictionary:
if set(second) == set(second_letters):
second_word = second
pair = tuple(sorted((first_word, second_word)))
solutions.append(pair)
solutions = sorted(set(solutions))
if not solutions:
print('There is no solution.')
else:
print(f'The pairs of words using all (distinct) letters '
f'in "{letters}" are:'
)
for solution in solutions:
print(solution)
这似乎有效但速度很慢。我想知道如何改进代码以使其更有效率,或者是否有更好的方法来做到这一点?任何帮助表示赞赏。
您可以通过构建一个表来查找包含剩余字母的单词来加快计算速度,而不是遍历所有单词并一个一个地测试它们。
我也将这些对视为无序对 - 如果您想同时返回
("abc", "def")
和 ("def", "abc")
,只需删除最后一个 if 语句即可。
def preprocess(dictionary):
clean_dictionary = [word for word in dictionary if len(word) == len(set(word))]
lookup_table = defaultdict(list)
for word in clean_dictionary:
ordered_word = "".join(sorted(word))
lookup_table[ordered_word].append(word)
return clean_dictionary, lookup_table
def get_pairs(dictionary, letters):
dictionary, letter_table = preprocess(dictionary)
letters = set(letters)
for word1 in dictionary:
if letters.issuperset(word1):
ordered_word = "".join(sorted(letters.difference(word1)))
for word2 in letter_table[ordered_word]:
if word1 < word2:
yield word1, word2
您可以通过构建一个 trie 然后遍历 trie 以仅查找作为给定字母子集的单词,跳过无效单词来进一步加快查询速度。只有当您要对同一个字典执行多个查询,并且即使使用上述解决方案您仍然看到性能问题时,这才值得。