使用列表推导（Python）删除列表中的元素

Question

我有以下数据：

[['The',
  'Fulton',
  'County',
  'Grand',
  'Jury',
  'said',
  'Friday',
  'an',
  'investigation',
  'of',
  "Atlanta's",
  'recent',
  'primary',
  'election',
  'produced',
  '``',
  'no',
  'evidence',
  "''",
  'that',
  'any',
  'irregularities',
  'took',
  'place',
  '.'],
 ['The',
  'jury',
  'further',
  'said',
  'in',
  'term-end',
  'presentments',
  'that',
  'the',
  'City',
  'Executive',
  'Committee',
  ',',
  'which',
  'had',
  'over-all',
  'charge',
  'of',
  'the',
  'election',
  ',',
  '``',
  'deserves',
  'the',
  'praise',
  'and',
  'thanks',
  'of',
  'the',
  'City',
  'of',
  'Atlanta',
  "''",
  'for',
  'the',
  'manner',
  'in',
  'which',
  'the',
  'election',
  'was',
  'conducted',
  '.']]

因此，我有一个包含2个其他列表的列表（在我的情况下，一个大列表中有50000个列表）。我想删除所有标点和停用词，例如“ the”，“ a”，“ of”等。

这是我编写的代码：

import string
from nltk.corpus import stopwords
nltk.download('stopwords')

punct = list(string.punctuation)
punct.append("``")
punct.append("''")
stops = set(stopwords.words("english")) 

res = [[word.lower() for word in sentence if word not in punct or word.lower() in not stops] for sentence in dataset]

但是它会返回我最初拥有的相同列表列表。我的代码有什么问题？

Answer 1

0
投票

您应使用and代替or：

Answer 2

0
投票

由于punct和stops不重叠，每个

使用列表推导（Python）删除列表中的元素

问题描述投票：0回答：2

2个回答

最新问题

使用列表推导（Python）删除列表中的元素

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2