请考虑以下字典为例:
d2 = [{'event_id': 't1',
'display_name': 't1',
'form_count': 0,
'repetition_id': None,
'children': [{'event_id': 't_01',
'display_name': 't(1)',
'form_count': 1,
'repetition_id': 't1',
'children': [],
'forms': [{'form_id': 'f1',
'form_repetition_id': '1',
'form_name': 'fff1',
'is_active': True,
'is_submitted': False}]}],
'forms': []},
{'event_id': 't2',
'display_name': 't2',
'form_count': 0,
'repetition_id': None,
'children': [{'event_id': 't_02',
'display_name': 't(2)',
'form_count': 1,
'repetition_id': 't2',
'children': [{'event_id': 't_03',
'display_name': 't(3)',
'form_count': 1,
'repetition_id': 't3',
'children': [],
'forms': [{'form_id': 'f3',
'form_repetition_id': '1',
'form_name': 'fff3',
'is_active': True,
'is_submitted': False}]}],
'forms': [{'form_id': 'f2',
'form_repetition_id': '1',
'form_name': 'fff2',
'is_active': True,
'is_submitted': False}]}],
'forms': []}]
上面的
d2
是一个字典列表,其中children
是一个嵌套字典,其键与父字典相同。
此外,
children
可以嵌套多个级别,这是不可能预先知道的。所以总之不知道要继续爆多少次了
当前 df:
In [54]: df11 = pd.DataFrame(d2)
In [55]: df11
Out[55]:
event_id display_name form_count repetition_id children forms
0 t1 t1 0 None [{'event_id': 't_01', 'display_name': 't(1)', ... []
1 t2 t2 0 None [{'event_id': 't_02', 'display_name': 't(2)', ... []
我想用下面的方式把它压平。
预期输出:
event_id display_name form_count repetition_id children forms
0 t1 t1 0 None {'event_id': 't_01', 'display_name': 't(1)', '... []
1 t2 t2 0 None {'event_id': 't_02', 'display_name': 't(2)', '... []
0 t_01 t(1) 1 t1 [] [{'form_id': 'f1', 'form_repetition_id': '1', ...
1 t_02 t(2) 1 t2 {'event_id': 't_03', 'display_name': 't(3)', ... [{'form_id': 'f2', 'form_repetition_id': '1', ...
0 t_03 t(3) 0 t3 [] [{'form_id': 'f2', 'form_repetition_id': '1'}]
我怎么知道有多少个嵌套子节点?
我的尝试:
In [58]: df12 = df11.explode('children')
In [64]: final = pd.concat([df12, pd.json_normalize(df12.children)])
In [72]: final
Out[72]:
event_id display_name form_count repetition_id children forms
0 t1 t1 0 None {'event_id': 't_01', 'display_name': 't(1)', '... []
1 t2 t2 0 None {'event_id': 't_02', 'display_name': 't(2)', '... []
0 t_01 t(1) 1 t1 [] [{'form_id': 'f1', 'form_repetition_id': '1', ...
1 t_02 t(2) 1 t2 [{'event_id': 't_03', 'display_name': 't(3)', ... [{'form_id': 'f2', 'form_repetition_id': '1', ...
任何帮助将不胜感激。
这可以通过一点递归编程来解决:
from collections import deque
queue = deque(d2)
d3 = []
while queue:
item = queue.popleft()
d3.append(item)
# Optionally add a parent_event_id. Remove if you don't need it.
queue += [
{**child, "parent_event_id": item["event_id"]}
for child in item.get("children", [])
]
df = pd.DataFrame(d3)