将熊猫分层数据框架转换为JSON

问题描述 投票:1回答:3

我有一个熊猫DataFrame,其中包含数据库中的层次结构元素。 parent_id列告诉我每个孩子属于哪个父级。我想从此DataFrame生成一个嵌套的JSON(有关所需的格式,请参见第3节)。我尝试了很多事情,但是我主要停留在每个父项的_children列表字段中,该字段应包含其所有子项。非常感谢任何帮助,谢谢!

1。熊猫DataFrame(我所拥有的)

parent_id    child_id   level
125582       214659     1
125582       214633     1
125582       214263     1
214263       131673     2
214263       125579     2
214263       125578     2
214263       172670     2
214263       214266     2
214266       216675     3
214266       216671     3
172670       172669     3
172670       174777     3
172670       207661     3
207661       216529     4
207661       223884     4
223884       223885     5
223885       229186     6
229186       219062     7
229186       222243     7
and so on...

2。 DataFrame逻辑

125582 (level 0 - root)
--214659 (level 1)
--214633 (level 1)
--214263 (level 1)
----131673 (level 2)
----125579 (level 2)
----125578 (level 2)
----172670 (level 2)
------172669 (level 3)
------174777 (level 3)
------207661 (level 3)
--------216529 (level 4)
--------223884 (level 4)
----------223885 (level 5)
------------229186 (level 6)
--------------219062 (level 7)    
--------------222243 (level 7)
----214266 (level 2)
------216675 (level 3)
------216671 (level 3)
and so on...

3。 JSON(我需要)

[
    {id: 125582, level: 0, "_children": [
        {id: 214659, level: 1},
        {id: 214633, level: 1},
        {id: 214263, level: 1, "_children": [
            {id: 131673, level: 2},
            {id: 125579, level: 2},
            {id: 125578, level: 2},
            {id: 172670, level: 2, "_children": [
                {id: 172669, level: 3},
                {id: 174777, level: 3},
                {id: 207661, level: 3, "_children": [
                    {id: 216529, level: 4},
                    {id: 223884, level: 4, "_children": [
                        {id: 223885, level: 5, "_children": [
                            {id: 229186, level: 6, "_children": [
                                {id: 219062, level: 7},
                                {id: 222243, level: 7}
                            ]}
                        ]}
                    ]}
                ]}
            ]},
            {id: 214266, level: 2, "_children": [
                {id: 216675, level: 3},
                {id: 216671, level: 3}
            ]}
        ]},
        and so on...
    ]}
]
python json pandas tabulator
3个回答
2
投票

此分层json具有父级->子级关系。因此,建议您使用networkx包从日期框架构造图形g。从图g创建所需的json。

您需要pipnetworkx之后安装this instruction

之后,请执行以下步骤:

  • 将networkx导入为nx
  • 使用g创建有向图from_pandas_edgelist
  • 使用dparent_id上的唯一键创建字典child_id值为level
  • d分配给图level的节点属性g
  • 最后,在nx.tree_data上呼叫g

代码如下:

import networkx as nx

#create the directed graph `g` using `from_pandas_edgelist`
g = nx.from_pandas_edgelist(df, 'parent_id', 'child_id', ['level'], 
                            create_using=nx.DiGraph())

#create a dictionary with unique key on `parent_id` and `child_id`
df1 = df.melt('level')
d = dict(zip(df1.value, df1.level))
root_id = df.loc[0, 'parent_id']
d[root_id] = 0  #set level of root to `0`

#assign `d` to nodes attributes `level` of graph `g`
nx.set_node_attributes(g, d, 'level')    #add `level` values to nodes of `g`

out = [nx.tree_data(g, root_id, {'id': 'id', 'children': '_children'})]

Out[119]:
[{'level': 0,
  'id': 125582,
  '_children': [{'level': 1, 'id': 214659},
   {'level': 1, 'id': 214633},
   {'level': 1,
    'id': 214263,
    '_children': [{'level': 2, 'id': 131673},
     {'level': 2, 'id': 125579},
     {'level': 2, 'id': 125578},
     {'level': 2,
      'id': 172670,
      '_children': [{'level': 3, 'id': 172669},
       {'level': 3, 'id': 174777},
       {'level': 3,
        'id': 207661,
        '_children': [{'level': 4, 'id': 216529},
         {'level': 4,
          'id': 223884,
          '_children': [{'level': 5,
            'id': 223885,
            '_children': [{'level': 6,
              'id': 229186,
              '_children': [{'level': 7, 'id': 219062},
               {'level': 7, 'id': 222243}]}]}]}]}]},
     {'level': 2,
      'id': 214266,
      '_children': [{'level': 3, 'id': 216675},
       {'level': 3, 'id': 216671}]}]}]}]

Note:上面的输出是list内的字典,是您想要的输出。我的解决方案输出将键level放在键id的前面,但是子词典的层次/结构与所需的输出相同


另一方面,如果所需输出中的不需要level,则直接使用内置的nx.tree_data函数按如下所示返回输出

import networkx as nx

g = nx.from_pandas_edgelist(df, 'parent_id', 'child_id', ['level'], 
                            create_using=nx.DiGraph())

root_id = df.loc[0, 'parent_id']    
out = [nx.tree_data(g, root_id, {'id': 'id', 'children': '_children'})]

Out[168]:
[{'id': 125582,
  '_children': [{'id': 214659},
   {'id': 214633},
   {'id': 214263,
    '_children': [{'id': 131673},
     {'id': 125579},
     {'id': 125578},
     {'id': 172670,
      '_children': [{'id': 172669},
       {'id': 174777},
       {'id': 207661,
        '_children': [{'id': 216529},
         {'id': 223884,
          '_children': [{'id': 223885,
            '_children': [{'id': 229186,
              '_children': [{'id': 219062}, {'id': 222243}]}]}]}]}]},
     {'id': 214266, '_children': [{'id': 216675}, {'id': 216671}]}]}]}]

更新

使用nx.tree_data的修改后的功能可以克服对每个孩子单亲的限制。将此自定义函数添加到您的代码中,然后调用它,而不是nx.tree_data

from itertools import chain

def tree_data_custom(G, root):
    id_ = 'id'
    children = '_children'
    def add_children(n, G):
        nbrs = G[n]
        if len(nbrs) == 0:
            return []
        children_ = []
        for child in nbrs:
            d = dict(chain(G.nodes[child].items(), [(id_, child)]))
            c = add_children(child, G)
            if c:
                d[children] = c
            children_.append(d)
        return children_

    data = dict(chain(G.nodes[root].items(), [(id_, root)]))    
    data[children] = add_children(root, G)
    return data

代替呼叫out = [nx.tree_data(g, root_id, {'id': 'id', 'children': '_children'})],只需如下呼叫tree_data_custom

out =  [tree_data_custom(g, root_id)]

更新2:将name列添加到数据框

样本df,其中孩子有多个父母

Out[258]:
    parent_id  child_id  level name
0      125582    214659      1   a1
1      125582    214633      1   a1
2      125582    214263      1   a1
3      214263    131673      2   a2
4      214263    125579      2   a2
5      214263    125578      2   a2
6      214263    172670      2   a2
7      214263    214266      2   a2
8      214266    216675      3   a3
9      214266    216671      3   a3
10     172670    172669      3   a3
11     172670    174777      3   a3
12     172670    207661      3   a3
13     207661    216529      4   a4
14     207661    223884      4   a4
15     223884    223885      5   a5
16     223885    229186      6   a6
17     229186    219062      7   a7
18     229186    222243      7   a7
19     222243    219187      8   a8
20     222243    245985      8   a8
21     222243    232393      8   a8
22     222243    247138      8   a8
23     222243    228848      8   a8
24     222243    228848      8   a8
25     222243    233920      8   a8
26     222243    233920      8   a8
27     222243    228113      8   a8
28     222243    233767      8   a8
29     222243    235407      8   a8
30     222243    237757      8   a8
31     222243    159091      8   a8
32     222243    159091      8   a8
33     222243    214832      8   a8
34     222243    253990      8   a8
35     222243    231610      8   a8
36     222243    231610      8   a8
37     222243    182323      8   a8
38     222243    242190      8   a8
39     222243    143580      8   a8
40     222243    242188      8   a8
41     222243    143581      8   a8
42     222243    242187      8   a8
43     222243    143582      8   a8
44     222243    242189      8   a8
45     222243    205877      8   a8
46     222243    242823      8   a8
47     222243    140979      8   a8
48     222243    237824      8   a8
49     222243    149933      8   a8
50     222243    149933      8   a8
51     222243    153625      8   a8
52     222243      8392      8   a8
53     222243    162085      8   a8
54     222243    162085      8   a8
55     222243    150691      8   a8
56     222243    147773      8   a8
57     222243    147773      8   a8
58     222243     61070      8   a8
59     222243     61070      8   a8
60     222243    204850      8   a8
61     222243    204850      8   a8
62      61070     46276      9   a9
63      61070     46276      9   a9
64      61070     46276      9   a9
65      61070     46276      9   a9
66     143580    159911      9   a9
67     143580     38958      9   a9
68     182323    159911      9   a9

更改很小,您只需要修改g的节点属性的步骤创建字典。当前,我们仅将属性level添加到g。现在,我们需要为name创建另一个字典,并将其添加到name

的节点的新属性g
#create the directed graph `g` using `from_pandas_edgelist`. 
#You don't need `[level]` in this step
g = nx.from_pandas_edgelist(df, 'parent_id', 'child_id', create_using=nx.DiGraph())

#create a dictionary with unique key on `parent_id` and `child_id`
#`melt` keep 2 columns 'level', 'name' instead of one column 'level'
#dictionary `d_level` for attribute `level` of `g's` nodes
#dictionary `d_name` for attribute `level` of `g's` nodes
df1 = df.melt(['level', 'name'])
d_level = dict(zip(df1.value, df1.level))
d_name = dict(zip(df1.value, df1.name))
root_id = df.loc[0, 'parent_id']
d_level[root_id] = 0  #set `level` of root to `0`

#assign `d_level` to nodes attributes `level` of graph `g` and `d_name` for `name`
nx.set_node_attributes(g, d_level, 'level')    #add `level` values to nodes of `g`
nx.set_node_attributes(g, d_name, 'name')    #add `name` values to nodes of `g`

#use customize `tree_data_custom` defined previously
out = [tree_data_custom(g, root_id)] 

如果向每行添加更多列,则最好为所有列创建一个字典,并按如下所示将一次应用于g的节点

g = nx.from_pandas_edgelist(df, 'parent_id', 'child_id', create_using=nx.DiGraph())

df1 = df.melt(['level', 'name'])
#this single dictionary to create both `level` and `name` attributes of nodes of `g`
d = {v: {'level': l, 'name': n} for v,l,n in zip(df1.value, df1.level, df1.name)}
root_id = df.loc[0, 'parent_id']
d[root_id]['level'] = 0  #set level of root to `0`

nx.set_node_attributes(g, d)    #a single add for both attributes `level`, `name`

out = [tree_data_custom(g, root_id)] 

输出

Out[260]:
[{'level': 0,
  'name': 'a1',
  'id': 125582,
  '_children': [{'level': 1, 'name': 'a1', 'id': 214659},
   {'level': 1, 'name': 'a1', 'id': 214633},
   {'level': 1,
    'name': 'a1',
    'id': 214263,
    '_children': [{'level': 2, 'name': 'a2', 'id': 131673},
     {'level': 2, 'name': 'a2', 'id': 125579},
     {'level': 2, 'name': 'a2', 'id': 125578},
     {'level': 2,
      'name': 'a2',
      'id': 172670,
      '_children': [{'level': 3, 'name': 'a3', 'id': 172669},
       {'level': 3, 'name': 'a3', 'id': 174777},
       {'level': 3,
        'name': 'a3',
        'id': 207661,
        '_children': [{'level': 4, 'name': 'a4', 'id': 216529},
         {'level': 4,
          'name': 'a4',
          'id': 223884,
          '_children': [{'level': 5,
            'name': 'a5',
            'id': 223885,
            '_children': [{'level': 6,
              'name': 'a6',
              'id': 229186,
              '_children': [{'level': 7, 'name': 'a7', 'id': 219062},
               {'level': 7,
                'name': 'a7',
                'id': 222243,
                '_children': [{'level': 8, 'name': 'a8', 'id': 219187},
                 {'level': 8, 'name': 'a8', 'id': 245985},
                 {'level': 8, 'name': 'a8', 'id': 232393},
                 {'level': 8, 'name': 'a8', 'id': 247138},
                 {'level': 8, 'name': 'a8', 'id': 228848},
                 {'level': 8, 'name': 'a8', 'id': 233920},
                 {'level': 8, 'name': 'a8', 'id': 228113},
                 {'level': 8, 'name': 'a8', 'id': 233767},
                 {'level': 8, 'name': 'a8', 'id': 235407},
                 {'level': 8, 'name': 'a8', 'id': 237757},
                 {'level': 8, 'name': 'a8', 'id': 159091},
                 {'level': 8, 'name': 'a8', 'id': 214832},
                 {'level': 8, 'name': 'a8', 'id': 253990},
                 {'level': 8, 'name': 'a8', 'id': 231610},
                 {'level': 8,
                  'name': 'a8',
                  'id': 182323,
                  '_children': [{'level': 9, 'name': 'a9', 'id': 159911}]},
                 {'level': 8, 'name': 'a8', 'id': 242190},
                 {'level': 8,
                  'name': 'a8',
                  'id': 143580,
                  '_children': [{'level': 9, 'name': 'a9', 'id': 159911},
                   {'level': 9, 'name': 'a9', 'id': 38958}]},
                 {'level': 8, 'name': 'a8', 'id': 242188},
                 {'level': 8, 'name': 'a8', 'id': 143581},
                 {'level': 8, 'name': 'a8', 'id': 242187},
                 {'level': 8, 'name': 'a8', 'id': 143582},
                 {'level': 8, 'name': 'a8', 'id': 242189},
                 {'level': 8, 'name': 'a8', 'id': 205877},
                 {'level': 8, 'name': 'a8', 'id': 242823},
                 {'level': 8, 'name': 'a8', 'id': 140979},
                 {'level': 8, 'name': 'a8', 'id': 237824},
                 {'level': 8, 'name': 'a8', 'id': 149933},
                 {'level': 8, 'name': 'a8', 'id': 153625},
                 {'level': 8, 'name': 'a8', 'id': 8392},
                 {'level': 8, 'name': 'a8', 'id': 162085},
                 {'level': 8, 'name': 'a8', 'id': 150691},
                 {'level': 8, 'name': 'a8', 'id': 147773},
                 {'level': 8,
                  'name': 'a8',
                  'id': 61070,
                  '_children': [{'level': 9, 'name': 'a9', 'id': 46276}]},
                 {'level': 8, 'name': 'a8', 'id': 204850}]}]}]}]}]}]},
     {'level': 2,
      'name': 'a2',
      'id': 214266,
      '_children': [{'level': 3, 'name': 'a3', 'id': 216675},
       {'level': 3, 'name': 'a3', 'id': 216671}]}]}]}]

1
投票

这不是熊猫的单行代码,但这是一个应该起作用的递归函数。

import json

import pandas as pd

df = pd.DataFrame(
    [{"parent_id": 125582, "child_id": 214659, "level": 1}, {"parent_id": 125582, "child_id": 214633, "level": 1},
     {"parent_id": 125582, "child_id": 214263, "level": 1}, {"parent_id": 214263, "child_id": 131673, "level": 2},
     {"parent_id": 214263, "child_id": 125579, "level": 2}, {"parent_id": 214263, "child_id": 125578, "level": 2},
     {"parent_id": 214263, "child_id": 172670, "level": 2}, {"parent_id": 214263, "child_id": 214266, "level": 2},
     {"parent_id": 214266, "child_id": 216675, "level": 3}, {"parent_id": 214266, "child_id": 216671, "level": 3},
     {"parent_id": 172670, "child_id": 172669, "level": 3}, {"parent_id": 172670, "child_id": 174777, "level": 3},
     {"parent_id": 172670, "child_id": 207661, "level": 3}, {"parent_id": 207661, "child_id": 216529, "level": 4},
     {"parent_id": 207661, "child_id": 223884, "level": 4}, {"parent_id": 223884, "child_id": 223885, "level": 5},
     {"parent_id": 223885, "child_id": 229186, "level": 6}, {"parent_id": 229186, "child_id": 219062, "level": 7},
     {"parent_id": 229186, "child_id": 222243, "level": 7}]
)


def recurse(parent_id, level):
    # create the base result
    result = {"id": int(parent_id), "level": int(level)}

    # get all of the children of this parent, one level below this one
    children = df[(df.parent_id == parent_id) & (df["level"] == level + 1)]

    # if there are no such children, then return without a _children key
    if children.empty:
        return result

    # otherwise, recurse on each child_id
    result["_children"] = [recurse(child_id, level + 1) for child_id in sorted(children.child_id.unique())]

    return result


# start at an implicit level 0
tree = [recurse(parent_id, 0) for parent_id in sorted(df[df["level"] == 1].parent_id.unique())]

# convert to JSON
print(json.dumps(tree, indent=3, sort_keys=True))

0
投票
id_to_obj = {}
roots = []
def parse(row):
    pid, cid, lvl = row.parent_id, row.child_id, row.level
    if pid not in id_to_obj:
        # parent not created yet: create parent as root
        id_to_obj[pid] = {
            'id': pid, 'level': 0
        }
        roots.append(pid)

    # Get parent object
    parent = id_to_obj[pid]
    if not '_children' in parent:
        parent['_children'] = []

    # Create child object and add to parent's children list
    child = {
        'id': cid, 'level': lvl
    }
    parent['_children'].append(child)
    id_to_obj[cid] = child

df.sort_values(by='level').apply(parse, axis=1)

# Result
[id_to_obj[pid] for pid in roots]
© www.soinside.com 2019 - 2024. All rights reserved.