只是从Pandas开始,因此将不胜感激。我的JSON具有嵌套元素,我想破坏这些元素,以便嵌套之间存在关系。稍后将其加载到数据库中,以便可以报告数据。数据示例如下:
"activities": [
{
"activityId": "a0a0ea45-b422-460f-b24b-540324124401",
"activityStart": "2020-06-02T01:13:52.178Z",
"activityEnd": "2020-06-02T01:17:48.800Z"
"users": [
{
"userId": "8cbc5047-fc60-45b8-8cd2-52d0934dabdc",
"userName": "ABC",
"sessions": [
{
"sessionId": "9822a58f-c8be-4834-88ba-c297f138b33b",
"segments": [
{
"segmentStart": "2020-06-02T01:13:52.181Z",
"segmentEnd": "2020-06-02T01:13:52.226Z",
"segmentType": "cold",
},
{
"segmentStart": "2020-06-02T01:13:52.226Z",
"segmentEnd": "2020-06-02T01:17:18.651Z",
"segmentType": "warm",
}
],
"metrics": [
{
"name": "tDelay",
"value": 1
}
],
"executionTagProvided": true
}
]
},
{
"UserId": "3e9dc85d-1427-4df7-a73b-75fd4d91148f",
.....
最终目标是拥有以下表格:活动,用户,细分,指标以及上一细分受众群的ID会沿用,因此它们之间存在关系,即细分为与sessionId关联的细分和指标,而会话与userId关联等等。
您可以执行此操作,并且仅从每个df中选择所需的列:
with open('1.json', 'r+') as f:
x = json.load(f)
df_a = pd.json_normalize(x['activities'])
print(df_a)
df_users = pd.json_normalize(x['activities'], record_path=['users'], meta=['activityId'])
print(df_users)
df_sessions = pd.json_normalize(x['activities'], record_path=['users', 'sessions'], meta=[['activityId'], ['users', 'userId']])
print(df_sessions)
df_segment = pd.json_normalize(x['activities'], record_path=['users', 'sessions', 'segments'], meta=[['activityId'], ['users', 'userId'], ['users', 'sessions', 'sessionId']])
print(df_segment)
df_metrics = pd.json_normalize(x['activities'], record_path=['users', 'sessions', 'metrics'], meta=[['activityId'], ['users', 'userId'], ['users', 'sessions', 'sessionId']])
print(df_metrics)
输出:
activityId activityStart activityEnd users
0 a0a0ea45-b422-460f-b24b-540324124401 2020-06-02T01:13:52.178Z 2020-06-02T01:17:48.800Z [{'userId': '8cbc5047-fc60-45b8-8cd2-52d0934da...
userId userName sessions activityId
0 8cbc5047-fc60-45b8-8cd2-52d0934dabdc ABC [{'sessionId': '9822a58f-c8be-4834-88ba-c297f1... a0a0ea45-b422-460f-b24b-540324124401
sessionId ... users.userId
0 9822a58f-c8be-4834-88ba-c297f138b33b ... 8cbc5047-fc60-45b8-8cd2-52d0934dabdc
[1 rows x 6 columns]
segmentStart segmentEnd ... users.userId users.sessions.sessionId
0 2020-06-02T01:13:52.181Z 2020-06-02T01:13:52.226Z ... 8cbc5047-fc60-45b8-8cd2-52d0934dabdc 9822a58f-c8be-4834-88ba-c297f138b33b
1 2020-06-02T01:13:52.226Z 2020-06-02T01:17:18.651Z ... 8cbc5047-fc60-45b8-8cd2-52d0934dabdc 9822a58f-c8be-4834-88ba-c297f138b33b
[2 rows x 6 columns]
name value activityId users.userId users.sessions.sessionId
0 tDelay 1 a0a0ea45-b422-460f-b24b-540324124401 8cbc5047-fc60-45b8-8cd2-52d0934dabdc 9822a58f-c8be-4834-88ba-c297f138b33b