我有一个已展平的表格(请参阅下面的表 1)来表示分层树。我需要创建等效的嵌套树(参见下面的图 1)。
有这样做的通用算法或概念吗?
我正在使用 Python 3.11 并对
anytree
库进行子类化。但是,转换为嵌套的 Python 字典(采用 anytree
的预期格式)也是一种解决方案。
我的代码如下面的 Listing 1 所示,但它非常粗糙...我所知道的是我需要弄清楚
lvl
何时发生变化,然后通过递归将这些对象添加到其父对象中。
我尝试过递归处理它,但我不知道何时递归、要传递哪些数据以及我的返回条件是什么。请参阅下面清单 1中的代码。
我的数据结构类似于邻接列表,但它是否足够相似,可以使用代码来实现?
索引 | lvl | 第 # 部分 |
---|---|---|
1 | 0 | 101 |
2 | 1 | 101-1 |
3 | 2 | 101-1A |
4 | 2 | 101-1B |
5 | 2 | 101-1C |
6 | 2 | 101-1D |
7 | 1 | 101-2 |
8 | 2 | 101-2A |
9 | 2 | 101-2B |
10 | 2 | 101-2C |
表1数据排列如下:
"""
Object `TreeNode` is a subclass of `anytree.AnyNode`. The object has
attributes:
- `index`: index number (row number)
- `lvl`: hierarchy level
- `parent`: The parent TreeNode of the object.
Parameter `tn_list` is the list of TreeNode objects
Parameter `parent_node` is the root node of the tree. It is a TreeNode object.
Parameter `current_level` keeps up with the current "area" in the tree we are
processing.
[??] I'm not sure if this will be the parent's level, current
node's level, etc.
Parameter `children` is a list of TreeNode objects which are children of
`parent_node`.
[??] I don't think I need this.
"""
def link_TreeNodes(tn_list=None, parent_node=None, children=None,
current_level=None):
temp_list = []
children_nodes = []
# Last node's lvl to determine when we need to stop and add all the
# children to the parent.
last_lvl = None
for node in temp_list:
lvl = getattr(node, 'lvl')
if last_lvl is None:
last_lvl = lvl
else:
last_lvl = lvl - 1
if lvl == current_level:
node.parent = parent_node
elif lvl == current_level + 1:
children_nodes.append(node)
elif lvl == current_level - 1:
# [??] I'm a lost as to what should happen here...
print(f'current_level: {current_level}; lvl: {lvl}')
# Determine if we need to stop
if last_lvl == lvl:
stop = False
continue
elif last_lvl == lvl + 1:
print('Stopping to process children nodes...')
stop = True
# We have detected a change in the table
if stop:
if children_nodes:
parent_node = link_TreeNodes(
tn_list=children_nodes, parent_node=parent_node,
current_level=parent_node.lvl+1)
else:
return parent_node
stop = False
# [??] What is my end condition? What do I return from the recursion?
return parent_node
test_data = {'index': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'level': [0, 1, 2, 2, 2, 2, 1, 2, 2, 2], 'part_number': ['101', '101-1', '101-1A', '101-1B', '101-1C', '101-1D', '101-2', '101-2A', '101-2B', '101-2C']}
correct_output = {
'101': {'parent': None, 'children': ['101-1', '101-2']},
'101-1': {'parent': '101', 'children': ['101-1A', '101-1B', '101-1C', '101-1D']},
'101-1A': {'parent': '101-1', 'children': None},
'101-1B': {'parent': '101-1', 'children': None},
'101-1C': {'parent': '101-1', 'children': None},
'101-1D': {'parent': '101-1', 'children': None},
'101-2': {'parent': '101', 'children': ['101-2A', '101-2B', '101-2C']},
'101-2A': {'parent': '101-2', 'children': None},
'101-2B': {'parent': '101-2', 'children': None},
'101-2C': {'parent': '101-2', 'children': None}
}
假设上述输入数据在文件
fn
中为纯 .csv 数据。
def data_reader(fn):
data = {}
with open(fn, newline='') as csvfile:
creader = csv.reader(csvfile, delimiter='\t', quotechar='"')
for i, row in enumerate(creader):
if i == 0:
continue
data[int(row[0])] = row[1:]
return data
def convert2TreeNode(data_dict):
nodes = []
for k, v in data_dict.items():
nodes.append(TreeNode.list2TreeNode(dict_key=k, dict_vals=v))
return nodes
# if __name__ == '__main__':
nodes = convert2TreeNode(data_reader(fn))