使用python将tokenize应用于json时出错

问题描述 投票:0回答:1

我一直在关注一个我可以看到的教程已经在用户中流行但是,有一个错误仍然存​​在我找不到解决方案。我正在使用PyCharm和Python3.6这段代码。我感谢时间的帮助和帮助,谢谢。

Code:

import json
from collections import Counter
import re
from nltk.corpus import stopwords
import string


with open(fname, 'r', newline='\r\n') as f:
    count_all = Counter()
    for line in f:

        tweet = json.loads(line)
        terms_stop = [term for term in preprocess(tweet['text']) if term not in stop]
        terms_single = set(terms_stop)

        terms_hash = [term for term in preprocess(tweet['text']) if term.startswith('#')]

The error I am getting is:

    Traceback (most recent call last):
  File "C:/Users/Sukhivinder/PycharmProjects/mscProjectOne/sentimentJSONfile.py", line 50, in <module>
    tweet = json.loads(line)
  File "C:\Program Files\Python36\lib\json\__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "C:\Program Files\Python36\lib\json\decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Program Files\Python36\lib\json\decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Data:

{"created_at":"Wed Nov 15 21:37:57 +0000 2017","id":930912780831678464,"id_str":"930912780831678464","text":"Greatest Brexit speech ever? LABOUR MP\u2019s address will make your neck hairs stand up https:\/\/t.co\/5G3uEELEll","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":886309819245879296,"id_str":"886309819245879296","name":"HMS","screen_name":"HMS150446","location":"England, United Kingdom","url":null,"description":"Any R\/T is not an endorsement but a way of sharing articles received on my twitter feed which I think are interesting and want to share.","translator_type":"none","protected":false,"verified":false,"followers_count":491,"friends_count":1480,"listed_count":4,"favourites_count":12236,"statuses_count":27375,"created_at":"Sat Jul 15 19:41:42 +0000 2017","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"F5F8FA","profile_background_image_url":"","profile_background_image_url_https":"","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/886676198155329538\/E8RsRDyz_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/886676198155329538\/E8RsRDyz_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/886309819245879296\/1500235099","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/5G3uEELEll","expanded_url":"https:\/\/www.express.co.uk\/news\/politics\/880048\/Brexit-speech-Labour-MP-Peter-Shore-EU","display_url":"express.co.uk\/news\/politics\/\u2026","indices":[84,107]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1510781877130"}
{"created_at":"Wed Nov 15 21:37:57 +0000 2017","id":930912782056345600,"id_str":"930912782056345600","text":"RT @ThatTimWalker: This is a courageous journalist and a courageous newspaper. Would Mrs May have owned up to our Russian problem if i\u2026 ","source":"\u003ca href=\"http:\/\/twitter.com\/#!\/download\/ipad\" rel=\"nofollow\"\u003eTwitter for iPad\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":2687419047,"id_str":"2687419047","name":"Elizabeth#FBPE#StopBrexit","screen_name":"epcarruthers","location":"Edinburgh and Dormont","url":null,"description":"wife,mother, grandmother, gardener, LibDemocrat, and owner of 2 delightful cats.","translator_type":"none","protected":false,"verified":false,"followers_count":248,"friends_count":185,"listed_count":2,"favourites_count":18269,"statuses_count":29872,"created_at":"Tue Jul 08 15:24:35 +0000 2014","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/715085957146525696\/edS1d2lF_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/715085957146525696\/edS1d2lF_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/2687419047\/1510238275","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Tue Nov 14 23:24:41 +0000 2017","id":930577252827516928,"id_str":"930577252827516928","text":"This is a courageous journalist and a courageous newspaper. Would Mrs May have owned up to our Russian problem if i\u2026 https:\/\/t.co\/U83rrcqahM","display_text_range":[0,140],"source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":true,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":60606206,"id_str":"60606206","name":"Tim Walker","screen_name":"ThatTimWalker","location":"London","url":null,"description":"A point of view","translator_type":"none","protected":false,"verified":true,"followers_count":18870,"friends_count":887,"listed_count":214,"favourites_count":8392,"statuses_count":26689,"created_at":"Mon Jul 27 14:07:16 +0000 2009","utc_offset":0,"time_zone":"Casablanca","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"1B95E0","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/914607099220496384\/dtzHhd2V_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/914607099220496384\/dtzHhd2V_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/60606206\/1398249447","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"quoted_status_id":930576544820547584,"quoted_status_id_str":"930576544820547584","quoted_status":{"created_at":"Tue Nov 14 23:21:52 +0000 2017","id":930576544820547584,"id_str":"930576544820547584","text":"Playing catch-up on Brexit-Trump-Russia? My piece from May. Read it before Cambridge Analytica (of FBI &amp; Wikileaks\u2026 https:\/\/t.co\/pnUZeed6kq","source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":true,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":722242009,"id_str":"722242009","name":"Carole Cadwalladr","screen_name":"carolecadwalla","location":null,"url":"https:\/\/www.theguardian.com\/profile\/carolecadwalladr","description":"Late adopter. Early giver-upper. Guardian & Observer writer.","translator_type":"none","protected":false,"verified":false,"followers_count":44945,"friends_count":1866,"listed_count":547,"favourites_count":660,"statuses_count":2899,"created_at":"Sat Jul 28 14:06:01 +0000 2012","utc_offset":3600,"time_zone":"Amsterdam","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/875727729525747717\/ZAIcCXFJ_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/875727729525747717\/ZAIcCXFJ_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/722242009\/1503701353","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"extended_tweet":{"full_text":"Playing catch-up on Brexit-Trump-Russia? My piece from May. Read it before Cambridge Analytica (of FBI &amp; Wikileaks fame!) sues @guardian into oblivion. They're trying to shut this - me, us - down. \nhttps:\/\/t.co\/KKZUJ81NE9","display_text_range":[0,225],"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/KKZUJ81NE9","expanded_url":"https:\/\/www.theguardian.com\/technology\/2017\/may\/07\/the-great-british-brexit-robbery-hijacked-democracy?CMP=share_btn_tw","display_url":"theguardian.com\/technology\/201\u2026","indices":[202,225]}],"user_mentions":[{"screen_name":"guardian","name":"The Guardian","id":87818409,"id_str":"87818409","indices":[131,140]}],"symbols":[]}},"quote_count":169,"reply_count":113,"retweet_count":2694,"favorite_count":2551,"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/pnUZeed6kq","expanded_url":"https:\/\/twitter.com\/i\/web\/status\/930576544820547584","display_url":"twitter.com\/i\/web\/status\/9\u2026","indices":[120,143]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en"},"is_quote_status":true,"extended_tweet":{"full_text":"This is a courageous journalist and a courageous newspaper. Would Mrs May have owned up to our Russian problem if it hadn't got into newspapers? I wonder. https:\/\/t.co\/Qj8AkSxnIx","display_text_range":[0,154],"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/Qj8AkSxnIx","expanded_url":"https:\/\/twitter.com\/carolecadwalla\/status\/930576544820547584","display_url":"twitter.com\/carolecadwalla\u2026","indices":[155,178]}],"user_mentions":[],"symbols":[]}},"quote_count":2,"reply_count":5,"retweet_count":206,"favorite_count":293,"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/U83rrcqahM","expanded_url":"https:\/\/twitter.com\/i\/web\/status\/930577252827516928","display_url":"twitter.com\/i\/web\/status\/9\u2026","indices":[117,140]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en"},"quoted_status_id":930576544820547584,"quoted_status_id_str":"930576544820547584","quoted_status":{"created_at":"Tue Nov 14 23:21:52 +0000 2017","id":930576544820547584,"id_str":"930576544820547584","text":"Playing catch-up on Brexit-Trump-Russia? My piece from May. Read it before Cambridge Analytica (of FBI &amp; Wikileaks\u2026 https:\/\/t.co\/pnUZeed6kq","source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":true,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":722242009,"id_str":"722242009","name":"Carole Cadwalladr","screen_name":"carolecadwalla","location":null,"url":"https:\/\/www.theguardian.com\/profile\/carolecadwalladr","description":"Late adopter. Early giver-upper. Guardian & Observer writer.","translator_type":"none","protected":false,"verified":false,"followers_count":44945,"friends_count":1866,"listed_count":547,"favourites_count":660,"statuses_count":2899,"created_at":"Sat Jul 28 14:06:01 +0000 2012","utc_offset":3600,"time_zone":"Amsterdam","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/875727729525747717\/ZAIcCXFJ_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/875727729525747717\/ZAIcCXFJ_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/722242009\/1503701353","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"extended_tweet":{"full_text":"Playing catch-up on Brexit-Trump-Russia? My piece from May. Read it before Cambridge Analytica (of FBI &amp; Wikileaks fame!) sues @guardian into oblivion. They're trying to shut this - me, us - down. \nhttps:\/\/t.co\/KKZUJ81NE9","display_text_range":[0,225],"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/KKZUJ81NE9","expanded_url":"https:\/\/www.theguardian.com\/technology\/2017\/may\/07\/the-great-british-brexit-robbery-hijacked-democracy?CMP=share_btn_tw","display_url":"theguardian.com\/technology\/201\u2026","indices":[202,225]}],"user_mentions":[{"screen_name":"guardian","name":"The Guardian","id":87818409,"id_str":"87818409","indices":[131,140]}],"symbols":[]}},"quote_count":169,"reply_count":113,"retweet_count":2694,"favorite_count":2551,"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/pnUZeed6kq","expanded_url":"https:\/\/twitter.com\/i\/web\/status\/930576544820547584","display_url":"twitter.com\/i\/web\/status\/9\u2026","indices":[120,143]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en"},"is_quote_status":true,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"ThatTimWalker","name":"Tim Walker","id":60606206,"id_str":"60606206","indices":[3,17]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en","timestamp_ms":"1510781877422"}

我已经添加了完整的错误消息以响应用户,但是我有一个由多数代码组成的帖子,因此需要进一步的文本以允许进行编辑。

python json python-3.x tokenize
1个回答
0
投票

TL;DR:

对于open()函数,请在模式字符串中使用U

What about my problem?

我改变你的open()电话使用Universal Newline Support。它去...

From:

with open(fname, 'r', newline='\r\n') as f:

To:

with open(fname, 'rU') as f:

这解决了我测试中的问题。

What is Universal Newline Support?

来自PEP-278

本PEP讨论了Python可以在文件上支持I / O的方式,该文件的换行格式不是平台上的本机格式,因此每个平台上的Python都可以使用CR(Macintosh),LF(Unix)读取和导入文件)或CR LF(Windows)行结尾。

© www.soinside.com 2019 - 2024. All rights reserved.