使用熊猫链接嵌套的json数据

问题描述 投票:-1回答:1

只是从Pandas开始,因此将不胜感激。我的JSON具有嵌套元素,我想破坏这些元素,以便嵌套之间存在关系。稍后将其加载到数据库中,以便可以报告数据。数据示例如下:

  "activities": [
{
  "activityId": "a0a0ea45-b422-460f-b24b-540324124401",
  "activityStart": "2020-06-02T01:13:52.178Z",
  "activityEnd": "2020-06-02T01:17:48.800Z"
  "users": [
    {
      "userId": "8cbc5047-fc60-45b8-8cd2-52d0934dabdc",
      "userName": "ABC",
      "sessions": [
        {
          "sessionId": "9822a58f-c8be-4834-88ba-c297f138b33b",
          "segments": [
            {
              "segmentStart": "2020-06-02T01:13:52.181Z",
              "segmentEnd": "2020-06-02T01:13:52.226Z",
              "segmentType": "cold",
            },
            {
              "segmentStart": "2020-06-02T01:13:52.226Z",
              "segmentEnd": "2020-06-02T01:17:18.651Z",
              "segmentType": "warm",
            }
          ],
          "metrics": [
            {
              "name": "tDelay",
              "value": 1
            }
          ],

          "executionTagProvided": true
        }
      ]
    },
    {
      "UserId": "3e9dc85d-1427-4df7-a73b-75fd4d91148f",
      .....

最终目标是拥有以下表格:活动,用户,细分,指标以及上一细分受众群的ID会沿用,因此它们之间存在关系,即细分为与sessionId关联的细分和指标,而会话与userId关联等等。

python-3.x pandas
1个回答
0
投票

您可以执行此操作,并且仅从每个df中选择所需的列:

with open('1.json', 'r+') as f:
    x = json.load(f)
df_a = pd.json_normalize(x['activities'])
print(df_a)

df_users = pd.json_normalize(x['activities'], record_path=['users'], meta=['activityId'])
print(df_users)

df_sessions = pd.json_normalize(x['activities'], record_path=['users', 'sessions'], meta=[['activityId'], ['users', 'userId']])
print(df_sessions)

df_segment = pd.json_normalize(x['activities'], record_path=['users', 'sessions', 'segments'], meta=[['activityId'], ['users', 'userId'], ['users', 'sessions', 'sessionId']])
print(df_segment)

df_metrics = pd.json_normalize(x['activities'], record_path=['users', 'sessions', 'metrics'], meta=[['activityId'], ['users', 'userId'], ['users', 'sessions', 'sessionId']])
print(df_metrics)

输出:

                             activityId             activityStart               activityEnd                                              users
0  a0a0ea45-b422-460f-b24b-540324124401  2020-06-02T01:13:52.178Z  2020-06-02T01:17:48.800Z  [{'userId': '8cbc5047-fc60-45b8-8cd2-52d0934da...
                                 userId userName                                           sessions                            activityId
0  8cbc5047-fc60-45b8-8cd2-52d0934dabdc      ABC  [{'sessionId': '9822a58f-c8be-4834-88ba-c297f1...  a0a0ea45-b422-460f-b24b-540324124401
                              sessionId  ...                          users.userId
0  9822a58f-c8be-4834-88ba-c297f138b33b  ...  8cbc5047-fc60-45b8-8cd2-52d0934dabdc

[1 rows x 6 columns]
               segmentStart                segmentEnd  ...                          users.userId              users.sessions.sessionId
0  2020-06-02T01:13:52.181Z  2020-06-02T01:13:52.226Z  ...  8cbc5047-fc60-45b8-8cd2-52d0934dabdc  9822a58f-c8be-4834-88ba-c297f138b33b
1  2020-06-02T01:13:52.226Z  2020-06-02T01:17:18.651Z  ...  8cbc5047-fc60-45b8-8cd2-52d0934dabdc  9822a58f-c8be-4834-88ba-c297f138b33b

[2 rows x 6 columns]
     name  value                            activityId                          users.userId              users.sessions.sessionId
0  tDelay      1  a0a0ea45-b422-460f-b24b-540324124401  8cbc5047-fc60-45b8-8cd2-52d0934dabdc  9822a58f-c8be-4834-88ba-c297f138b33b
© www.soinside.com 2019 - 2024. All rights reserved.