如何从树状文件目录文本文件创建嵌套字典对象?

问题描述 投票:2回答:2

我有一个树形结构,由制表符和行分隔,如下所示:

a
\t1
\t2
\t3
\t\tb
\t\tc
\t4
\t5

And I am looking to turn this into:

{
'name': 'a',
'children': [
 {'name': '1'},
 {'name': '2'},
 {
   'name': '3'
   'children': [
      {'name': 'b'},
      {'name': 'c'}
    ]
  },
  {'name': '4'},
  {'name': '5'}
  ]
}

对于d3.js可折叠树数据输入。我假设我必须以某种方式使用递归,但我无法弄清楚如何。

我试过将输入转换为这样的列表:

[('a',0), ('1',1), ('2',1), ('3',1), ('b',2), ('c',2), ('4',1), ('5',1)]

使用此代码:

def parser():
    #run from root `retail-tree`: `python3 src/main.py`
    l, all_line_details = list(), list()
    with open('assets/retail') as f:
        for line in f:
            line = line.rstrip('\n ')
            splitline = line.split('    ') 
            tup = (splitline[-1], len(splitline)-1)
            l.append(splitline)
            all_line_details.append(tup)
            print(tup)
    return all_line_details

这里,第一个元素是字符串本身,第二个元素是该行中的制表符数。不确定递归步骤来完成此任务。感谢任何帮助!

python python-3.x string parsing string-parsing
2个回答
2
投票

您可以使用一个使用re.findall的函数,该函数使用与行匹配的正则表达式作为节点的名称,后跟0个或多个以制表符开头的行,分组为子节点,然后递归地为子节点构建相同的结构从子字符串中剥离每行的第一个选项卡:

import re
def parser(s):
    output = []
    for name, children in re.findall(r'(.*)\n((?:\t.*\n)*)', s):
        node = {'name': name}
        if children:
            node.update({'children': parser(''.join(line[1:] for line in children.splitlines(True)))})
        output.append(node)
    return output

所以给出:

s = '''a
\t1
\t2
\t3
\t\tb
\t\tc
\t4
\t5
'''

parser(s)[0]回归:

{'name': 'a',
 'children': [{'name': '1'},
              {'name': '2'},
              {'name': '3', 'children': [{'name': 'b'}, {'name': 'c'}]},
              {'name': '4'},
              {'name': '5'}]}

0
投票

使用您自己的parser函数提供的列表结构:

def make_tree(lines, tab_count=0):
    tree = []
    index = 0
    while index < len(lines):
        if lines[index][1] == tab_count:
            node = {"name": lines[index][0]}
            children, lines_read = make_tree(lines[index + 1:], tab_count + 1)
            if children:
                node["children"] = children
                index += lines_read
            tree.append(node)
        else:
            break
        index += 1
    return tree, index

测试用例:

lines = [("a", 0), ("1", 1), ("2", 1), ("3", 1), ("b", 2), ("c", 2), ("4", 1), ("5", 1)]

test_1 = make_tree([("a", 0)])
assert test_1[0] == [{"name": "a"}], test_1
test_2 = make_tree([("a", 0), ("b", 1)])
assert test_2[0] == [{"name": "a", "children": [{"name": "b"}]}], test_2
test_3 = make_tree(lines)
expected_3 = [
    {
        "name": "a",
        "children": [
            {"name": "1"},
            {"name": "2"},
            {"name": "3", "children": [{"name": "b"}, {"name": "c"}]},
            {"name": "4"},
            {"name": "5"},
        ],
    }
]
assert test_3[0] == expected_3, test_3

请注意,如果源文件具有多个根节点(即多个没有前导标签的行),并且还有递归的整洁性,则输出将包装在列表中。

© www.soinside.com 2019 - 2024. All rights reserved.