Pandas:分解嵌套字典列的列表并附加为新行

问题描述 投票:0回答:1

请考虑以下字典为例:

d2 = [{'event_id': 't1',
  'display_name': 't1',
  'form_count': 0,
  'repetition_id': None,
  'children': [{'event_id': 't_01',
    'display_name': 't(1)',
    'form_count': 1,
    'repetition_id': 't1',
    'children': [],
    'forms': [{'form_id': 'f1',
      'form_repetition_id': '1',
      'form_name': 'fff1',
      'is_active': True,
      'is_submitted': False}]}],
  'forms': []},
 {'event_id': 't2',
  'display_name': 't2',
  'form_count': 0,
  'repetition_id': None,
  'children': [{'event_id': 't_02',
    'display_name': 't(2)',
    'form_count': 1,
    'repetition_id': 't2',
    'children': [{'event_id': 't_03',
      'display_name': 't(3)',
      'form_count': 1,
      'repetition_id': 't3',
      'children': [],
      'forms': [{'form_id': 'f3',
        'form_repetition_id': '1',
        'form_name': 'fff3',
        'is_active': True,
        'is_submitted': False}]}],
    'forms': [{'form_id': 'f2',
      'form_repetition_id': '1',
      'form_name': 'fff2',
      'is_active': True,
      'is_submitted': False}]}],
  'forms': []}]

上面的

d2
是一个字典列表,其中
children
是一个嵌套字典,其键与父字典相同。

此外,

children
可以嵌套多个级别,这是不可能预先知道的。所以总之不知道要继续爆多少次了

当前 df:

In [54]: df11 = pd.DataFrame(d2)

In [55]: df11
Out[55]: 
  event_id display_name  form_count repetition_id                                           children forms
0       t1           t1           0          None  [{'event_id': 't_01', 'display_name': 't(1)', ...    []
1       t2           t2           0          None  [{'event_id': 't_02', 'display_name': 't(2)', ...    []

我想用下面的方式把它压平。

预期输出:

 event_id display_name  form_count repetition_id                                           children                                              forms
0       t1           t1           0          None  {'event_id': 't_01', 'display_name': 't(1)', '...                                                 []
1       t2           t2           0          None  {'event_id': 't_02', 'display_name': 't(2)', '...                                                 []
0     t_01         t(1)           1            t1                                                 []  [{'form_id': 'f1', 'form_repetition_id': '1', ...
1     t_02         t(2)           1            t2  {'event_id': 't_03', 'display_name': 't(3)', ...  [{'form_id': 'f2', 'form_repetition_id': '1', ...
0     t_03         t(3)           0            t3                                                 []     [{'form_id': 'f2', 'form_repetition_id': '1'}]

我怎么知道有多少个嵌套子节点?

我的尝试:

In [58]: df12 = df11.explode('children')
In [64]: final = pd.concat([df12, pd.json_normalize(df12.children)])
In [72]: final
Out[72]: 
  event_id display_name  form_count repetition_id                                           children                                              forms
0       t1           t1           0          None  {'event_id': 't_01', 'display_name': 't(1)', '...                                                 []
1       t2           t2           0          None  {'event_id': 't_02', 'display_name': 't(2)', '...                                                 []
0     t_01         t(1)           1            t1                                                 []  [{'form_id': 'f1', 'form_repetition_id': '1', ...
1     t_02         t(2)           1            t2  [{'event_id': 't_03', 'display_name': 't(3)', ...  [{'form_id': 'f2', 'form_repetition_id': '1', ...

任何帮助将不胜感激。

python pandas nested
1个回答
0
投票

这可以通过一点递归编程来解决:

from collections import deque

queue = deque(d2)
d3 = []

while queue:
    item = queue.popleft()
    d3.append(item)

    # Optionally add a parent_event_id. Remove if you don't need it.
    queue += [
        {**child, "parent_event_id": item["event_id"]}
        for child in item.get("children", [])
    ]

df = pd.DataFrame(d3)
© www.soinside.com 2019 - 2024. All rights reserved.