[无论何时尝试将twitter数据流打印到文件,我都遇到unicode错误

问题描述 投票:0回答:1

这是我的python代码,用于从twitter检索数据。但是当我尝试将数据存储到gannie.txt时,遇到以下错误。

File "D:\software\Anaconda\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 5-6: character maps to <undefined>

关于此的任何帮助,我都是这种文本挖掘的新手,我正在尝试使用自然语言处理来构建情感分析项目

这是我的代码:

outF = open("gannie.txt", "a")
for tweet in tweets:
    #print(tweet.text)
    Tweet = tweet.text
                #Convert www.* or https?://* to URL
    Tweet = re.sub('((www\.[\s]+)|(https?://[^\s]+))','URL',Tweet)


    Tweet = re.sub('@[^\s]+','TWITTER_USER',Tweet)

                #Remove additional white spaces
    Tweet = re.sub('[\s]+', ' ', Tweet)

                #Replace #word with word Handling hashtags
    Tweet = re.sub(r'#([^\s]+)', r'\1', Tweet)

                #trim
    Tweet = Tweet.strip('\'"')

                #Deleting happy and sad face emoticon from the tweet 
    a = ':)'
    b = ':('
    Tweet = Tweet.replace(a,'')
    Tweet = Tweet.replace(b,'')

                #Deleting the Twitter @username tag and reTweets
    tag = 'TWITTER_USER' 
    rt = 'RT'
    url = 'URL'
    Tweet = Tweet.replace(tag,'')
    tweetCount+=1
    if rt in Tweet:
        continue
    Tweet = Tweet.replace(url,'')
    print(Tweet)
    outF.write(Tweet)
    outF.write("\n")
outF.close()
python nlp data-analysis text-mining sentiment-analysis
1个回答
0
投票

我仅通过添加encoding =“ utf-8”打开文件行就得到了答案

之前:outF = open("gannie.txt", "a")

之后:outF = open("gannie.txt", "a",encoding="utf-8")

© www.soinside.com 2019 - 2024. All rights reserved.