如何将嵌套字典更快地转换为pd.dataframe?

问题描述 投票:2回答:1

我有一个看起来像这样的json文件

{
    "file": "name",
    "main": [{
        "question_no": "Q.1",
        "question": "what is ?",
        "answer": [{
                "user": "John",
                "comment": "It is defined as",
                "value": [
                          {
                            "my_value": 5,
                            "value_2": 10
                          },
                          {
                            "my_value": 24,
                            "value_2": 30
                          }
                          ]
            },
            {
                "user": "Sam",
                "comment": "as John said above it simply means",
                "value": [
                          {
                            "my_value": 9,
                            "value_2": 10
                          },
                          {
                            "my_value": 54,
                            "value_2": 19
                          }
                          ]
            }
        ],
        "closed": "no"
    }]
}

期望的结果:

Question_no      question  my_value_sum      value_2_sum       user      comment
Q.1             what is ?      29                40            john    It is defined as
Q.1             what is ?      63                29            Sam     as John said above it simply means

我尝试过的是data = json_normalize(file_json, "main"),然后使用类似for的循环

for ans, row in data.iterrows():
    ....
    ....
    df = df.append(the data)

但是使用此问题的原因是我的客户要花费很多时间才能拒绝该解决方案。 main列表中大约有1200个项目,并且有450个要转换的json文件。因此,此中间转换过程将耗时将近一个小时。

编辑:是否可以将my_valuevalue_2之和作为一列? (也更新了所需的结果)

python json pandas dictionary itertools
1个回答
2
投票

mainrecord_pathmeta选择字典:

data = pd.json_normalize(file_json["main"], 
                         record_path='answer', 
                         meta=['question_no', 'question'])
print (data)
   user                             comment question_no   question
0  John                    It is defined as         Q.1  what is ?
1   Sam  as John said above it simply means         Q.1  what is ?

然后,如果顺序很重要,则将最后N列转换为第一个位置:

N = 2
data = data[data.columns[-N:].tolist() + data.columns[:-N].tolist()]
print (data)
  question_no   question  user                             comment
0         Q.1  what is ?  John                    It is defined as
1         Q.1  what is ?   Sam  as John said above it simply means
© www.soinside.com 2019 - 2024. All rights reserved.