我想转换下面的 json 文件:
[
{
"userid": "5275800381",
"status": "UserStatus.RECENTLY",
"name": "Ah",
"bot": false,
"username": "None"
},
{
"userid": "5824657725",
"status": "UserStatus.LAST_MONTH",
"name": "A45",
"bot": false,
"username": "None"
},
{
"userid": "5160075986",
"status": "UserStatus.RECENTLY",
"name": "CTLA",
"bot": false,
"username": "james888"
}
]
转换为包含更多列且没有重复项的 csv 文件,如下所示:
username,user id,access hash,name,group,group id,is_bot,is_admin,dc_id,have_photo,phone,elaborated
输出文件应该是:
username,user id,access hash,name,group,group id,is_bot,is_admin,dc_id,have_photo,phone,elaborated
,5275800381,False,False,False,False,False,False,False,False,False,False
,5824657725,False,False,False,False,False,False,False,False,False,False
james888,5160075986,False,False,False,False,False,False,False,False,False,False
我尝试过以下代码:
import json
with open('target_user2.json', 'r', encoding='utf-8') as fp:
target = json.load(fp) #this file contains the json
with open('members2.csv', 'w', encoding='utf-8') as nf: # target_userid2.txt or target_userid2.json
nf.write('username,user id,access hash,name,group,group id,is_bot,is_admin,dc_id,have_photo,phone,elaborated' + '\n')
for item in target:
if item['user id'] in [x['user id'] for x in target]:
if item['username'] != "None":
item['username'] == ""
record = item['username'] + ',' + item['user id'] + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False'
nf.write(json.dumps(record).replace('"', '') + '\n') # write data without ""
它不起作用,因为错误是由
item['user id']
(带有空格的用户 ID 不起作用)生成的,但 item['userid']
起作用。
我该如何解决这个问题?
使用
csv
模块编写 CSV 文件,而不是自己格式化。
使用
set
检测重复的用户 ID 并跳过它们。
修复用空字符串替换
None
用户名时的逻辑。
访问 JSON 时,必须使用
userid
作为键,而不是 user id
。
import csv
userids = set()
with open('members2.csv', 'w', encoding='utf-8') as nf: # target_userid2.txt or target_userid2.json
nf_csv = csv.writer(nf)
nf_csv.write(['username', 'user id', 'access hash', 'name', 'group', 'group id', 'is_bot', 'is_admin', 'dc_id', 'have_photo', 'phone', 'elaborated'])
for item in target:
if item['userid'] not in userids: # prevent duplicate userids
userids.add(item['userid'])
if item['username'] == "None":
item['username'] = ""
record = [item['username'], item['userid'], 'False', 'False', 'False', 'False', 'False', 'False', 'False', 'False', 'False', 'False']
nf_csv.write(record)