从带有汉字的Whatsapp文本文件中制作Wordcloud

问题描述 投票:0回答:1

我对编程非常陌生,我正在尝试从其中包含汉字的WhatsApp文本文件生成词云。

我一直试图将我在网上找到的两个教程结合起来,但它不起作用。

供参考,我正在使用下面的两个教程和PyCharm:

import pandas as pd
from PIL import Image
from os import path
import os
import numpy as np
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS
import jieba

# get data directory (using getcwd() is needed to support running example in generated IPython notebook)
d = path.dirname(__file__) if "__file__" in locals() else os.getcwd()

stopwords_path = d + '/wc_cn/stopwords_cn_en.txt'
# Chinese fonts must be set
font_path = d + '/fonts/SourceHanSerif/SourceHanSerifK-Light.otf'


# importing text file
df1 = pd.read_csv('ourchat.txt', sep=r'[ap]m -', names=['time', 'message'])

userdict_list = ['阿Q', '孔乙己', '单四嫂子']

# The function for processing text with Jieba
def jieba_processing_txt(text):
    for word in userdict_list:
        jieba.add_word(word)

    mywordlist = []
    seg_list = jieba.cut(text, cut_all=False)
    liststr = "/ ".join(seg_list)

    with open(stopwords_path, encoding='utf-8') as f_stop:
        f_stop_text = f_stop.read()
        f_stop_seg_list = f_stop_text.splitlines()

    for myword in liststr.split('/'):
        if not (myword.strip() in f_stop_seg_list) and len(myword.strip()) > 1:
            mywordlist.append(myword)
    return ' '.join(mywordlist)

#splitting the message into name and original message
df2 = df1['message'].astype(str).str.split(":", expand=True,n=1)
df_all = pd.concat([df1, df2], axis=1)
df_all = df_all.rename(columns={'message': 'total', 0:'name', 1:'message'})
df_all.drop('total', axis=1, inplace=True)

#saving the messages which are in the time column instead of message column

df_all.loc[df_all.time.str.contains(r'[a-zA-Z]')==True, 'message'] = df_all[df_all.time.str.contains(r'[a-zA-Z]')==True].time
df_all.fillna(' ', inplace=True)

#Delete rows where name includes an activity on group #
df_all = df_all[df_all.name.str.contains("added|changed|created|left")==False]

#Store the text in a variable
text = ' '.join(df_all['message'])

#Remove stopwords if any (you can add more to this list)
STOPWORDS.update(["Tom", "PM", "missed video", "AM", "https", "image", "image omitted", "omitted", "video", "video call", ""])

#Use a masked image to create good looking word clouds
image_mask = np.array(Image.open("heartcloud.png"))

#creating wordcloud
wc = WordCloud(background_color="white", max_words=2000, mask=image_mask, stopwords=STOPWORDS.add("said"))
wc.generate(jieba_processing_txt(text))
plt.imshow(wc)
wc.to_file("word_cloud.png")

我收到此错误消息:

Traceback (most recent call last): <br/>
File "C:/Users/Tom/Desktop/Wordcloud/whatsapp.py", line 66, in <module> <br/>
    wc.generate(jieba_processing_txt(text)) <br/>
  File "C:/Users/Tom/Desktop/Wordcloud/whatsapp.py", line 32, in jieba_processing_txt <br/>
    with open(stopwords_path, encoding='utf-8') as f_stop: <br/>
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/Tom/Desktop/Wordcloud/wc_cn/stopwords_cn_en.txt'
python python-3.x whatsapp word-cloud chinese-locale
1个回答
0
投票

消息很清楚:

No such file or directory: 'C:/Users/Tom/Desktop/Wordcloud/wc_cn/stopwords_cn_en.txt'

该文件不存在。尝试创建一个程序或修改您的程序以避免使用它。

© www.soinside.com 2019 - 2024. All rights reserved.