使用Pandas / Python标准化嵌套JSON数据

问题描述 投票:0回答:1

我正在尝试规范化相似的样本数据

{
  "2018-04-26 10:09:33": [
    {
      "user_id": "M8BE957ZA",
      "ts": "2018-04-26 10:06:33",
      "message": "Hello"
    }
  ],
  "2018-04-27 19:10:55": [
    {
      "user_id": "M5320QS1X",
      "ts": "2018-04-27 19:10:55",
      "message": "Thank you"
    }
  ],

我知道我可以使用json_normalize(data,'2018-04-26 10:09:33',record_prefix= '')在熊猫中创建表格,但是日期/时间不断变化。我如何规范化它,所以我有以下内容?任何建议

                          user_id.        ts                    message

2018-04-26 10:09:33       M8BE957ZA.      2018-04-26 10:06:33.  Hello
2018-04-26 10:09:33       M5320QS1X       2018-04-27 19:10:55.  Thank you
python json pandas normalize
1个回答
0
投票
test = {
  "2018-04-26 10:09:33": [
    {
      "user_id": "M8BE957ZA",
      "ts": "2018-04-26 10:06:33",
      "message": "Hello"
    }
  ],
  "2018-04-27 19:10:55": [
    {
      "user_id": "M5320QS1X",
      "ts": "2018-04-27 19:10:55",
      "message": "Thank you"
    }
  ]}
df = pd.DataFrame(test).melt()


    variable            value
0   2018-04-26 10:09:33 {'user_id': 'M8BE957ZA', 'ts': '2018-04-26 10:...
1   2018-04-27 19:10:55 {'user_id': 'M5320QS1X', 'ts': '2018-04-27 19:...

读入数据框作为字典,然后将其融化以获得上述结构。接下来,您可以在值列上使用json.normalize,然后将其重新加入变量列,如下所示:

df.join(json_normalize(df['value'])).drop(columns = 'value').rename(columns = {'variable':'date'})

    date                user_id     ts                  message
0   2018-04-26 10:09:33 M8BE957ZA   2018-04-26 10:06:33 Hello
1   2018-04-27 19:10:55 M5320QS1X   2018-04-27 19:10:55 Thank you
© www.soinside.com 2019 - 2024. All rights reserved.