从列表中删除元组删除了一些但不是全部

问题描述 投票:2回答:1

我必须遗漏一些非常明显的东西。

我有一个元组列表,它们是(短语,数字)对。我想从我的停用词列表中删除包含包含停用词的短语的整个元组。

stopwords = ['for', 'with', 'and', 'in', 'on', 'down']
tup_list = [('faucet', 5185), ('kitchen', 2719), ('faucets', 2628),
            ('kitchen faucet', 1511), ('shower', 1471), ('bathroom', 1131),
            ('handle', 1048), ('for', 1035), ('cheap', 960), ('bronze', 807),
            ('tub', 797), ('sale', 771), ('sink', 762), ('with', 696),
            ('single', 620), ('kitchen faucets', 615), ('stainless faucet', 613),
            ('pull', 603), ('and', 477), ('in', 447), ('single handle', 430),
            ('for sale', 406), ('bathroom faucet', 392), ('on', 369),
            ('down', 363), ('head', 359), ('pull down', 357), ('wall', 351),
            ('faucet with', 350)]

for p,n in tup_list:
    print('p', p, p.split(), any(phrase in stopwords for phrase in p.split()))

print(len(tup_list))
for p,n in tup_list:
    if any(phrase in stopwords for phrase in p.split()):
        tup_list.remove((p,n))
        print('Removing', p)
print(len(tup_list))

print([item for item in tup_list if item[0] == 'in'])

当我运行上面的内容时,我得到以下打印输出:

p faucet ['faucet'] False
p kitchen ['kitchen'] False
p faucets ['faucets'] False
p kitchen faucet ['kitchen', 'faucet'] False
p shower ['shower'] False
p bathroom ['bathroom'] False
p handle ['handle'] False
p for ['for'] True
p cheap ['cheap'] False
p bronze ['bronze'] False
p tub ['tub'] False
p sale ['sale'] False
p sink ['sink'] False
p with ['with'] True
p single ['single'] False
p kitchen faucets ['kitchen', 'faucets'] False
p stainless faucet ['stainless', 'faucet'] False
p pull ['pull'] False
p and ['and'] True
p in ['in'] True
p single handle ['single', 'handle'] False
p for sale ['for', 'sale'] True
p bathroom faucet ['bathroom', 'faucet'] False
p on ['on'] True
p down ['down'] True
p head ['head'] False
p pull down ['pull', 'down'] True
p wall ['wall'] False
p faucet with ['faucet', 'with'] True
29
Removing for
Removing with
Removing and
Removing for sale
Removing on
Removing pull down
Removing faucet with
22
[('in', 447)]

我的问题:为什么包含('in', 447)的元组不会被删除?打印输出显示p in ['in'] True意思是'in'在停用词列表中,那么为什么tup_list.remove((p,n))不会删除它?

python python-3.x list tuples
1个回答
0
投票

从列表中删除项目时,索引会更改。当您迭代更改的列表时,您将看到意外的结果。

这是一个解决方案。它不是最有效的,但可能适合您的需求。

remove_indices = []

for i, (p, n) in enumerate(tup_list):
    if any(phrase in stopwords for phrase in p.split()):
        remove_indices.append(i)
        print('Removing', p)

tup_list = [i for j, i in enumerate(tup_list) if j not in remove_indices]
© www.soinside.com 2019 - 2024. All rights reserved.