给定以下 Pandas DataFrame(原始 DataFrame 有 200 多行):
import pandas as pd
df = pd.DataFrame({
'child': ['Europe', 'France', 'Paris','North America', 'US', 'Canada'],
'parent': ["", 'Europe', 'France',"", 'North America', 'North America'],
'value': [746.4, 67.75, 2.16, 579,331.9, 38.25]
})
df
|---+---------------+---------------+--------|
| | child | parent | value |
|---+---------------+---------------+--------|
| 0 | Europe | | 746.40 |
| 1 | France | Europe | 67.75 |
| 2 | Paris | France | 2.16 |
| 3 | North America | | 579.00 |
| 4 | US | North America | 331.90 |
| 5 | Canada | North America | 38.25 |
|---+---------------+---------------+--------|
我想生成以下 JSON 树:
[
{
name: 'Europe',
value: 746.4,
children: [
{
name: 'France',
value: 67.75,
children: [
{
name: 'Paris',
value: 2.16
}
]
}
]
},
{
name: 'North America',
value: 579,
children: [
{
name: 'US',
value: 331.9,
},
{
name: 'Canada',
value: 38.25
}
]
}
];
该树将用作 ECharts 可视化的输入,例如这个基本旭日图。
networkx
包来实现此目的。首先将数据框转换为图表:
import networkx as nx
G = nx.from_pandas_edgelist(df, source='parent', target='child', edge_attr='value', create_using=nx.DiGraph)
nx.draw(G, with_labels=True)
接下来,我们将图表获取为 JSON 格式的树:
from networkx.readwrite import json_graph
data = json_graph.tree_data(G, root='')
data = data['children'] # remove the root
这将如下所示:
[{'id': 'Europe',
'children': [{'id': 'France', 'children': [{'id': 'Paris'}]}]},
{'id': 'North America', 'children': [{'id': 'US'}, {'id': 'Canada'}]}]
最后,通过添加回值并将“id”重命名为“name”来对 JSON 数据进行后处理。也许有更好的方法来做到这一点,但下面的方法有效。
edge_values = nx.get_edge_attributes(G,'value')
def post_process_json(data, parent=''):
print(data)
data['name'] = data.pop('id')
data['value'] = edge_values[(parent, data['name'])]
if 'children' in data.keys():
data['children'] = [post_process_json(child, parent=data['name']) for child in data['children']]
return data
data = [post_process_json(d) for d in data]
最终结果:
[{'children': [{'children': [{'name': 'Paris', 'value': 2.16}],
'name': 'France',
'value': 67.75}],
'name': 'Europe',
'value': 746.4},
{'children': [{'name': 'US', 'value': 331.9},
{'name': 'Canada', 'value': 38.25}],
'name': 'North America',
'value': 579.0}]
您可以首先将各个节点创建为
{ name, value }
字典,并按名称键入它们。然后将它们连接起来:
result = []
d = { "": { "children": result } }
for child, value in zip(df["child"], df["value"]):
d[child] = { "name": child, "value": value }
for child, parent in zip(df["child"], df["parent"]):
if "children" not in d[parent]:
d[parent]["children"] = []
d[parent]["children"].append(d[child])
对于我们的示例,
result
将是:
[{
'name': 'Europe',
'value': 746.4,
'children': [{
'name': 'France',
'value': 67.75,
'children': [{
'name': 'Paris',
'value': 2.16
}]
}]
}, {
'name': 'North America',
'value': 579.0,
'children': [{
'name': 'US',
'value': 331.9
}, {
'name': 'Canada',
'value': 38.25
}]
}]