我有一个看起来像这样的json文件
{
"file": "name",
"main": [{
"question_no": "Q.1",
"question": "what is ?",
"answer": [{
"user": "John",
"comment": "It is defined as",
"value": [
{
"my_value": 5,
"value_2": 10
},
{
"my_value": 24,
"value_2": 30
}
]
},
{
"user": "Sam",
"comment": "as John said above it simply means",
"value": [
{
"my_value": 9,
"value_2": 10
},
{
"my_value": 54,
"value_2": 19
}
]
}
],
"closed": "no"
}]
}
期望的结果:
Question_no question my_value_sum value_2_sum user comment
Q.1 what is ? 29 40 john It is defined as
Q.1 what is ? 63 29 Sam as John said above it simply means
我尝试过的是data = json_normalize(file_json, "main")
,然后使用类似for的循环
for ans, row in data.iterrows():
....
....
df = df.append(the data)
但是使用此问题的原因是我的客户要花费很多时间才能拒绝该解决方案。 main
列表中大约有1200个项目,并且有450个要转换的json文件。因此,此中间转换过程将耗时将近一个小时。
编辑:是否可以将my_value
和value_2
之和作为一列? (也更新了所需的结果)
用main
和record_path
和meta
选择字典:
data = pd.json_normalize(file_json["main"],
record_path='answer',
meta=['question_no', 'question'])
print (data)
user comment question_no question
0 John It is defined as Q.1 what is ?
1 Sam as John said above it simply means Q.1 what is ?
然后,如果顺序很重要,则将最后N列转换为第一个位置:
N = 2
data = data[data.columns[-N:].tolist() + data.columns[:-N].tolist()]
print (data)
question_no question user comment
0 Q.1 what is ? John It is defined as
1 Q.1 what is ? Sam as John said above it simply means