我有一个包含员工电子邮件、经理电子邮件和经理层次列的数据框。我试图获得经理拥有的团队数量。
我当前的数据框
emp_email mgr_email mgr_hier_01 mgr_hier_02 mgr_hier_03
[email protected] [email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected] NAN NAN
我希望实现的是一个列,如果员工是经理,该列会给出经理拥有的团队数量。例如,[email protected] 有 2 位经理向她汇报([email protected] 和 [email protected]),因此她手下的团队数应该为 2。而 [email protected] 没有向他汇报的经理但他是一名经理,负责管理 2 个个人贡献者([email protected] 和 [email protected])。所以 [email protected] 下的团队数应该是 1.
emp_email mgr_email mgr_hier_01 mgr_hier_02 mgr_hier_03 num_teams_if_mgr
[email protected] [email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] [email protected] 0
[email protected] [email protected] [email protected] [email protected] 1
[email protected] [email protected] [email protected] 2
[email protected] [email protected] [email protected] 1
[email protected] [email protected] [email protected] [email protected] 1
[email protected] [email protected] [email protected] 2
[email protected] [email protected] [email protected] 1
[email protected] [email protected] [email protected] 1
[email protected] [email protected] [email protected] [email protected] 1
[email protected] [email protected] [email protected] 1
[email protected] [email protected] [email protected] [email protected] 1
[email protected] [email protected] [email protected] [email protected] 1
[email protected] NAN NAN 6
到目前为止,我只能使用下面的代码为数据框创建层次结构列。感谢任何形式的帮助。
import networkx as nx
# create graph
G = nx.from_pandas_edgelist(df_hc, source='mgr_email', target='emp_email', create_using=nx.DiGraph)
# find roots (= top managers)
roots = [n for n,d in G.in_degree() if d==0]
# for each employee, find the hierarchy
df_hierarchy = (pd.DataFrame([next((p for root in roots for p in nx.all_simple_paths(G, root, node)), [])[:-1] for node in df_hc['emp_email']], index= df_hc.index).rename(columns=lambda x: f'mgr_hier_{x+1:02d}'))
# join to original DataFrame
df_hc2 = df_hc.join(df_hierarchy)
我不完全理解你的团队概念,但是假设一个团队不止一个人,然后统计非叶子的后代:
leafs = {n for n,d in G.out_degree() if d==0}
d = {n: len(nx.descendants_at_distance(G, n, 1)-leafs)
for n in G.nodes}
df_hc['num_teams_if_mgr'] = df_hc['emp_email'].map(d)
输出:
emp_email mgr_email mgr_hier_01 mgr_hier_02 mgr_hier_03 num_teams_if_mgr
0 [email protected] [email protected] [email protected] [email protected] [email protected] 0
1 [email protected] [email protected] [email protected] [email protected] [email protected] 0
2 [email protected] [email protected] [email protected] [email protected] None 0
3 [email protected] [email protected] [email protected] [email protected] None 0
4 [email protected] [email protected] [email protected] [email protected] [email protected] 0
5 [email protected] [email protected] [email protected] [email protected] None 0
6 [email protected] [email protected] [email protected] [email protected] None 0
7 [email protected] [email protected] [email protected] [email protected] [email protected] 0
8 [email protected] [email protected] [email protected] [email protected] None 0
9 [email protected] [email protected] [email protected] [email protected] [email protected] 0
10 [email protected] [email protected] [email protected] [email protected] [email protected] 0
11 [email protected] [email protected] [email protected] [email protected] None 0
12 [email protected] [email protected] [email protected] None None 1
13 [email protected] [email protected] [email protected] None None 0
14 [email protected] [email protected] [email protected] [email protected] None 0
15 [email protected] [email protected] [email protected] None None 1
16 [email protected] [email protected] [email protected] None None 1
17 [email protected] [email protected] [email protected] None None 0
18 [email protected] [email protected] [email protected] [email protected] None 0
19 [email protected] [email protected] [email protected] None None 0
20 [email protected] [email protected] [email protected] [email protected] None 0
21 [email protected] [email protected] [email protected] [email protected] None 0
22 [email protected] NAN NAN None None 6
如果员工本身不是叶子要算+1:
leafs = {n for n,d in G.out_degree() if d==0}
d = {n: len(nx.descendants_at_distance(G, n, 1)-leafs)
+ (not n in leafs) for n in G.nodes}
df_hc['num_teams_if_mgr'] = df_hc['emp_email'].map(d)
输出:
emp_email mgr_email mgr_hier_01 mgr_hier_02 mgr_hier_03 num_teams_if_mgr
0 [email protected] [email protected] [email protected] [email protected] [email protected] 0
1 [email protected] [email protected] [email protected] [email protected] [email protected] 0
2 [email protected] [email protected] [email protected] [email protected] None 0
3 [email protected] [email protected] [email protected] [email protected] None 0
4 [email protected] [email protected] [email protected] [email protected] [email protected] 0
5 [email protected] [email protected] [email protected] [email protected] None 0
6 [email protected] [email protected] [email protected] [email protected] None 0
7 [email protected] [email protected] [email protected] [email protected] [email protected] 0
8 [email protected] [email protected] [email protected] [email protected] None 0
9 [email protected] [email protected] [email protected] [email protected] [email protected] 0
10 [email protected] [email protected] [email protected] [email protected] [email protected] 0
11 [email protected] [email protected] [email protected] [email protected] None 0
12 [email protected] [email protected] [email protected] None None 2
13 [email protected] [email protected] [email protected] None None 1
14 [email protected] [email protected] [email protected] [email protected] None 1
15 [email protected] [email protected] [email protected] None None 2
16 [email protected] [email protected] [email protected] None None 2
17 [email protected] [email protected] [email protected] None None 1
18 [email protected] [email protected] [email protected] [email protected] None 1
19 [email protected] [email protected] [email protected] None None 1
20 [email protected] [email protected] [email protected] [email protected] None 1
21 [email protected] [email protected] [email protected] [email protected] None 0
22 [email protected] NAN NAN None None 7
图表: