如何使用wordcloud STOPWORDS从txt文件中调用停用词

问题描述 投票:0回答:1

我正在从pdf文件中提取文字云。我可以从列表中提取停用词,但无法使用txt文件进行提取。我知道调用文件路径时出现问题。

我已成功使用列表编辑停用词,但我希望能够使用txt文件作为停用词,因为最终我想将不同的停用词文件用于不同目的。

在此先感谢您的帮助。

#viz libs
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
#img libs
from PIL import Image
#binary array lib
import numpy as np
#pdf reader
import PyPDF4

pdfFileObj = open('Test-Resume-Doc.pdf', 'rb')
pdfReader = PyPDF4.PdfFileReader(pdfFileObj)
print(pdfReader.numPages)
pageObj = pdfReader.getPage(0)
pageText = (pageObj.extractText())
pdfFileObj.close()
#set stopwords
stopwords = set(STOPWORDS)

#can call stopwords from a list as such
#stopwords.update(["word1", "word2", "word3", ...])
#call stopwords from txt file and program executes ignoring txt file, the problem is how the path is run
stopwords.update(['stopwords.txt'])

rsMask = np.array(Image.open('Resume_WordCloud.png'))
#create wordcloud with stopwords
cloud = WordCloud(stopwords=stopwords, background_color="black", mask=rsMask).generate(pageObj.extractText())


plt.imshow(cloud, interpolation="bilinear")
plt.axis("off")
plt.savefig('path.../PythonPDFRW/Resume_WordCloud_fromPython.png'.format(cloud))
plt.show()```
python-3.x stop-words word-cloud
1个回答
0
投票

使用for循环读取文件中的每一行然后去掉换行符\ n它不优雅但它有效。

for line in text_file:
    stopwords.add(line.strip('\n'))
print(stopwords)```

© www.soinside.com 2019 - 2024. All rights reserved.