列表非常大的排列组合

问题描述 投票:0回答:1

我正试图使用Python运行一个非常大的排列组合。目标是将四组或更少的项目配对,用 1) 句号、2) 破折号和 3) 无任何分隔。顺序很重要。

# input
food = ['', 'apple', 'banana', 'bread', 'tomato', 'yogurt', ...] `

# ideal output would be a list that contains strings like the following:
apple-banana-bread (no dashes before or after!)
apple.banana.bread (using periods)
applebananabread (no spaces)
apple-banana (by combining with the first item in the list, I also get shorter groups but need to delete empty items before joining)
... for all the possible groups of 4, order is important

# Requirements:
# Avoiding a symbol at the beginning or end of a resulting string
# Also creating groups of length 1, 2, and 3

我使用了 itertools.permutations 来创建一个itertools.chain (perms). 但是,这就失败了,有了 MemoryError 在转换为列表后删除空元素时。即使在使用内存很大的机器时也是如此。

food = ['', 'apple', 'banana', 'bread', 'tomato', 'yogurt', ...] `
perms_ = itertools.permutations(food, 4)
perms = [list(filter(None, tup)) for tup in perms]     # remove empty nested elements, to prevent two symbols in a row or a symbol before/after
perms = filter(None, perms)                            # remove empty lists, to prevent two symbols in a row or a symbol before/after

names_t = (
['.'.join(group) for group in perms_t] +     # join using dashes
['-'.join(group) for group in perms_t] +     # join using periods
[''.join(group) for group in perms_t]        # join without spaces
)

names_t = list(set(names_t))                 # remove all duplicates


我怎样才能使这段代码更节省内存,使它不会因为一个大的列表而崩溃?如果我需要,我可以为每个项目分隔符(逗号、句号、直接连接)分别运行代码。

python performance out-of-memory permutation
1个回答
2
投票

鉴于此,我不太清楚你会用保存的6B事物列表来做什么,但我认为如果你想继续前进,你有2个策略。

首先,你可以减少列表中事物的大小,用类似于 numpy 单元8,这将减少很多结果列表的大小,但你不会有你想要的格式。

In [15]: import sys                                                             

In [16]: import numpy as np                                                     

In [17]: list_of_strings = ['dog food'] * 1000000                               

In [18]: list_of_uint8s = np.ones(1000000, dtype=np.uint8)                      

In [19]: sys.getsizeof(list_of_strings)                                         
Out[19]: 8000056

In [20]: sys.getsizeof(list_of_uint8s)                                          
Out[20]: 1000096

第二,如果你只是想把这些项目 "保存 "到某个大文件中,你不需要在内存中实现列表。 只要使用 itertools.permutations 并即时将对象写到文件中。 如果你只是想把它推送到文件中,就不需要在内存中创建列表......

In [48]: from itertools import permutations                                     

In [49]: stuff = ['dog', 'cat', 'mouse']                                        

In [50]: perms = permutations(stuff, 2)                                         

In [51]: with open('output.csv', 'w') as tgt: 
    ...:     for p in perms: 
    ...:         line = '-'.join(p) 
    ...:         tgt.write(line) 
    ...:         tgt.write('\n') 
    ...:                                                                        

In [52]: %more output.csv                                                       
dog-cat
dog-mouse
cat-dog
cat-mouse
mouse-dog
mouse-cat
© www.soinside.com 2019 - 2024. All rights reserved.