尝试使用 pandas 读取 JSON 文件时出错

问题描述 投票:0回答:1

我正在使用 python (pandas) 读取包含原始推文的 JSON 文件,但出现以下错误:

ValueError:解码数组值时发现意外字符 (2)

我将不胜感激任何帮助。

编辑:这是 JSON 示例

{"created_at":"Sat Nov 16 14:15:52 +0000 2019","id":1195707056365461505,"id_str":"1195707056365461505","text":"这里有任何阿森纳红色成员,请私信我。 ..有几个问题\ud83d\ude05\ud83e\udd14","来源":"\u003ca href=\"http://twitter.com/download/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c/a\u003e","截断":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{ “id”:974846850,“id_str”:“974846850”,“name”:“Rico Rodrigo”,“screen_name”:“DatGuyTy_online”,“location”:“Brum”,“url”:null,“description”:“有抱负的会计师 x 阿森纳爱好者 x 动漫迷","translator_type":"none","protected":false,"verified":false,"followers_count":647,"friends_count":901,"listed_count":9,"favourites_count ":24989,"statuses_count":24628,"created_at":"11 月 27 日星期二 22:25:31 +0000 2012","utc_offset":null,"time_zone":null,"geo_enabled":true,"lang": null,"contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https ":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color": “DDEEF6”,“profile_text_color”:“333333”,“profile_use_background_image”:true,“profile_image_url”:“http://pbs.twimg.com/profile_images/1071377159682514945/Np4nGX5m_normal.jpg”,“profile_image_url_https”:“https:/ /pbs.twimg.com/profile_images/1071377159682514945/Np4nGX5m_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/974846850/1554183093","default_profile":true,"default_profile_image":false, “以下”:null,“follow_request_sent”:null,“通知”:null},“geo”:null,“坐标”:null,“地点”:null,“贡献者”:null,“is_quote_status”:false,” quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[],"符号":[]},"收藏":false,"转发":false,"filter_level":"low","lang":"en","timestamp_ms":"1573913752057"}

这是我用来读取文件的代码:

import numpy as np 
import pandas as pd 
import re 
import matplotlib.pyplot as plt 
import json 
import os

tweet_file = 'raw_data.json' 
tweets = pd.read_json(tweet_file, convert_dates=True, lines=True, encoding='utf-8')
python json python-3.x pandas twitter
1个回答
0
投票

我自己的 json 文件遇到了这个错误,用 pandas 尝试:

File ~/.local/lib/python3.9/site-packages/pandas/io/json/_json.py:1133, in FrameParser._parse_no_numpy(self)
   1129 orient = self.orient
   1131 if orient == "columns":
   1132     self.obj = DataFrame(
-> 1133         loads(json, precise_float=self.precise_float), dtype=None
   1134     )
   1135 elif orient == "split":
   1136     decoded = {
   1137         str(k): v
   1138         for k, v in loads(json, precise_float=self.precise_float).items()
   1139     }

ValueError: Unexpected character found when decoding array value (1)

然后我在 VSCode 中以 json 形式打开该文件,并检查第 2 行第 914 列,发现该列后面有一个制表符而不是空格。

为了解决此问题,我正则表达式将所有制表符替换为四个空格:

旁注:我有一个带有许多硬编码的

\n
换行符的 json,并认为我也必须删除它们,但这些硬编码的
\n
没有害处,你可以保留它们。

您可能会在 VSCode 或其他 JSON 编辑器的 JSON 视图中找到其他红色标记。我还遇到了“JSONDecodeError”错误,请一次性查看这两个修复:如何修复使用 json.load() 加载 json 文件时的错误“JSONDecodeError: Expecting value: ...”?.

© www.soinside.com 2019 - 2024. All rights reserved.