我正在使用 python (pandas) 读取包含原始推文的 JSON 文件,但出现以下错误:
ValueError:解码数组值时发现意外字符 (2)
我将不胜感激任何帮助。
编辑:这是 JSON 示例
{"created_at":"Sat Nov 16 14:15:52 +0000 2019","id":1195707056365461505,"id_str":"1195707056365461505","text":"这里有任何阿森纳红色成员,请私信我。 ..有几个问题\ud83d\ude05\ud83e\udd14","来源":"\u003ca href=\"http://twitter.com/download/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c/a\u003e","截断":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{ “id”:974846850,“id_str”:“974846850”,“name”:“Rico Rodrigo”,“screen_name”:“DatGuyTy_online”,“location”:“Brum”,“url”:null,“description”:“有抱负的会计师 x 阿森纳爱好者 x 动漫迷","translator_type":"none","protected":false,"verified":false,"followers_count":647,"friends_count":901,"listed_count":9,"favourites_count ":24989,"statuses_count":24628,"created_at":"11 月 27 日星期二 22:25:31 +0000 2012","utc_offset":null,"time_zone":null,"geo_enabled":true,"lang": null,"contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https ":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color": “DDEEF6”,“profile_text_color”:“333333”,“profile_use_background_image”:true,“profile_image_url”:“http://pbs.twimg.com/profile_images/1071377159682514945/Np4nGX5m_normal.jpg”,“profile_image_url_https”:“https:/ /pbs.twimg.com/profile_images/1071377159682514945/Np4nGX5m_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/974846850/1554183093","default_profile":true,"default_profile_image":false, “以下”:null,“follow_request_sent”:null,“通知”:null},“geo”:null,“坐标”:null,“地点”:null,“贡献者”:null,“is_quote_status”:false,” quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[],"符号":[]},"收藏":false,"转发":false,"filter_level":"low","lang":"en","timestamp_ms":"1573913752057"}
这是我用来读取文件的代码:
import numpy as np
import pandas as pd
import re
import matplotlib.pyplot as plt
import json
import os
tweet_file = 'raw_data.json'
tweets = pd.read_json(tweet_file, convert_dates=True, lines=True, encoding='utf-8')
我自己的 json 文件遇到了这个错误,用 pandas 尝试:
File ~/.local/lib/python3.9/site-packages/pandas/io/json/_json.py:1133, in FrameParser._parse_no_numpy(self)
1129 orient = self.orient
1131 if orient == "columns":
1132 self.obj = DataFrame(
-> 1133 loads(json, precise_float=self.precise_float), dtype=None
1134 )
1135 elif orient == "split":
1136 decoded = {
1137 str(k): v
1138 for k, v in loads(json, precise_float=self.precise_float).items()
1139 }
ValueError: Unexpected character found when decoding array value (1)
然后我在 VSCode 中以 json 形式打开该文件,并检查第 2 行第 914 列,发现该列后面有一个制表符而不是空格。
为了解决此问题,我正则表达式将所有制表符替换为四个空格:
旁注:我有一个带有许多硬编码的
\n
换行符的 json,并认为我也必须删除它们,但这些硬编码的 \n
没有害处,你可以保留它们。
您可能会在 VSCode 或其他 JSON 编辑器的 JSON 视图中找到其他红色标记。我还遇到了“JSONDecodeError”错误,请一次性查看这两个修复:如何修复使用 json.load() 加载 json 文件时的错误“JSONDecodeError: Expecting value: ...”?.