我有一个csv文件,想通过读取文件内容来构造一棵树
id | screen_name | reply_status_id | tweet | stance
1 | a | null | dahgfsjhg | +
2 | b | 1 | fcjgvujhgjhk | -
3 | c | 2 | ououoijoskjfpokpo | +
4 | d | 1 | giuyhewikuhieuhi | +
5 | e | 3 | hkjhkjlkjljlkjlj | -
我想用tweet创建一个基于id和reply_status_id的树结构。
喜欢,
a [root]
(stance of b) -/ \+ (stance of d)
b d [childs]
+/
c
-/
e
我的 csv 文件是
[link]drive.google.com/open?id=1Z8paHBJVv6FJWeskYyR_6AhB2Ifor4Mz.
我使用此代码构建了树:
import csv
from anytree import Node
from anytree.exporter import DotExporter
def find_subnodes(root_node, root_node_id, nodes):
for row in lst:
node_id = row[0]
# name = regex.sub('', row[3])
name = row[3].replace('\\"', '\'').replace('"', '')
parent_node_id = row[2]
if root_node_id == parent_node_id:
node = Node(name, root_node)
nodes[node_id] = node
nodes = find_subnodes(node, node_id, nodes)
return nodes
with open('rumour1.csv') as f:
reader = csv.reader(f)
next(reader)
lst = list(reader)
r_node = Node(lst[0][3].replace('\\"', '\'').replace('"', ''))
n = {lst[0][0]: r_node}
n = find_subnodes(r_node, lst[0][0], n)
DotExporter(r_node).to_picture('tree.png')
但是边缘还没有用这个代码标记。任何人都可以帮我添加带有立场列的边缘[csv 中的最后一列]。
column
命令(来自 util-linux)
$ cat input.csv
id,screen_name,reply_status_id
1,a,null
2,b,1
3,c,2
4,d,1
5,e,3
$ tail -n+2 input.csv | column -t -s, -i1 -p3 -r2 -H1,3
a
├─b
│ └─c
│ └─e
└─d
您可以分两步完成此操作,首先使用父子关系将数据帧转换为树结构,然后打印/显示树。这可以使用 bigtree 通过两行代码来完成。
# Set up (I only took the relevant columns)
import pandas as pd
data = pd.DataFrame([
[1, "a", None, "+"],
[2, "b", 1, "-"],
[3, "c", 2, "+"],
[4, "d", 1, "+"],
[5, "e", 3, "-"]
], columns=["id", "screen_name", "reply_status_id", "stance"]
)
# Some preprocessing
rename_dict = dict(zip(data["id"], data["screen_name"]))
data["id"] = data["id"].replace(rename_dict)
data["reply_status_id"] = data["reply_status_id"].replace(rename_dict)
# 1. Convert dataframe into a tree structure
from bigtree import dataframe_to_tree_by_relation
node = dataframe_to_tree_by_relation(
data,
child_col="id",
parent_col="reply_status_id",
)
# 2. Print/show the tree
node.show()
这将产生结果
a
├── b
│ └── c
│ └── e
└── d
如果您想显示姿态信息,可以尝试将其导出为图像,边缘标记有姿态信息。
from bigtree import tree_to_dot
graph = tree_to_dot(node, edge_attr=lambda x: {"label": x.stance})
graph.write_png("tree_stance.png")
免责声明:我是bigtree的作者:)