我想将'The'、'They'和'My'排除在我的word cloud中。我使用了如下的python库 "wordcloud",并在STOPWORDS列表中添加了这3个额外的停顿字,但wordcloud仍然包含了它们。我需要怎么改,才能把这3个词排除在外?
我导入的库是。
import numpy as np
import pandas as pd
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
我试着在下面的STOPWORDS集合中添加元素 但是,即使单词被成功添加,wordcloud仍然显示我添加到STOPWORDS集合中的3个单词。
len(STOPWORDS)
输出: 192
然后我就跑了
STOPWORDS.add('The')
STOPWORDS.add('They')
STOPWORDS.add('My')
然后我就跑了
len(STOPWORDS)
输出: 195
我运行的是python 3.7.3版本。
我知道我可以在运行wordcloud之前修改文本输入以删除这3个单词(而不是尝试修改WordCloud的STOPWORDS设置),但我想知道是否WordCloud存在一个错误,或者我没有正确使用STOPWORDS更新?
Wordcloud的默认值是 collocations=True
因此,两个相邻词的频繁短语会被包含在云中--而且对于您的问题来说,重要的是,对于搭配,去除停顿词是不同的,因此,例如 "Thank you "是一个有效的搭配,可能会出现在生成的云中,即使 "you "在默认的停顿词中。只包含停顿词的搭配 是 删除。
这个听起来不无道理的理由是,如果在建立搭配列表之前删除停顿词,那么例如 "thank you very much "就会提供 "thank very "作为搭配,这绝对不是我想要的。
所以,为了让你的停顿词可以或许按照你的预期工作,即云中完全不出现停顿词,你可以使用 collocations=False
像这样。
my_wordcloud = WordCloud(
stopwords=my_stopwords,
background_color='white',
collocations=False,
max_words=10).generate(all_tweets_as_one_string)
UPDATE:
The
的文字不会被删除,而 the
被删除了--这就是为什么 @Balaji Ambresh 的代码可以工作,你会看到云中没有盖子。这可能是Wordcloud的一个缺陷,不确定。不过在停止词中添加e.g. The
到 stopwords 不会影响这一点,因为 stopwords 始终是小写的,不管是否有拼写 TrueFalse这些在源代码中都可以看到:-)
例如,在默认的 collocations=True
我明白了。
代码:
from wordcloud import WordCloud
from matplotlib import pyplot as plt
text = "The bear sat with the cat. They were good friends. " + \
"My friend is a bit bear like. He's lovely. The bear, the cat, the dog and me were all sat " + \
"there enjoying the view. You should have seen it. The view was absolutely lovely. " + \
"It was such a lovely day. The bear was loving it too."
cloud = WordCloud(collocations=False,
background_color='white',
max_words=10).generate(text)
plt.imshow(cloud, interpolation='bilinear')
plt.axis('off')
plt.show()
pip install nltk
不要忘记安装停止符
python
>>> import nltk
>>> nltk.download('stopwords')
试试这个。
from wordcloud import WordCloud
from matplotlib import pyplot as plt
from nltk.corpus import stopwords
stopwords = set(stopwords.words('english'))
text = "The bear sat with the cat. They were good friends. " + \
"My friend is a bit bear like. He's lovely. The bear, the cat, the dog and me were all sat " + \
"there enjoying the view. You should have seen it. The view was absolutely lovely. " + \
"It was such a lovely day. The bear was loving it too."
cloud = WordCloud(stopwords=stopwords,
background_color='white',
max_words=10).generate(text.lower())
plt.imshow(cloud, interpolation='bilinear')
plt.axis('off')
plt.show()