Python - 使用员工姓名和经理电子邮件列创建经理层次结构列

问题描述 投票:0回答:1

我尝试仅使用员工 ID 和经理电子邮件地址列来创建经理层次结构列。 Name 列具有唯一值,而 Mgr_Email 列中存在重复项(即每个员工有一名经理,但一名经理可以在其组织中拥有多个报告)。

数据是这样的

Name         Mgr_Email          

Sally Po     [email protected]
Sean Sea     [email protected]   
Jacob Hin    [email protected] 
Tim Buick    [email protected]
Kris Olt     [email protected]
Cindy Myers  [email protected]

期望的结果是这样的。经理层次结构可能有很多级别,而不仅仅是示例中显示的 4 个层次结构列。

Name         Mgr_Email          Mgr_Lvl_01           Mgr_lvl_02      Mgr_lvl_03       Mgr_lvl_04

Sally Po     [email protected]  [email protected]  
Sean Sea     [email protected]    [email protected]  [email protected]
Jacob Hin    [email protected]   [email protected]  [email protected] [email protected] [email protected] 
Tim Buick    [email protected]    [email protected]  [email protected] [email protected]
Kris Olt     [email protected] [email protected]  [email protected] [email protected]
Cindy Myers  [email protected]    [email protected]  [email protected]

我已经尝试过,但它不起作用

i=1
df['Level 0'] = df['Manager Email Address']

while df.notna().sum().ne(1).all():
    df[f'Mgr_Lvl {i}'] = df[f'Mgr_Lvl {i-1}'].map(df.set_index('Name')['Mgr_Email'])
    i+=1

df = df.drop('Level 0',axis=1)
df['Mgr_Lvl_01'] = df.loc[:,f'Mgr_Level {i-1}'].ffill().bfill()

感谢我能得到的任何帮助,谢谢。

python-3.x pandas dataframe hierarchy feature-engineering
1个回答
3
投票

遵循 uo 注释。下面的方法适用于小型数据集,但对于大型数据集效率不高。请参阅这个更高效的替代方案,同样基于

networkx

仅使用 无法轻松解决此问题,您需要将其视为图形问题。

这是你的图表:

一个有用的工具是

networkx

# make email address from Name
# (best would be to already have an identifier to map names)
df['Email'] = df['Name'].str.lower().str.replace(r'(\w+) (\w+)', r'\1.\[email protected]', regex=True)

import networkx as nx

# create graph
G = nx.from_pandas_edgelist(df, source='Mgr_Email', target='Email',
                            create_using=nx.DiGraph)

# find roots (= top managers)
roots = [n for n,d in G.in_degree() if d==0]
# ['[email protected]']

# for each employee, find the hierarchy 
df2 = (pd.DataFrame([next((p for root in roots for p in nx.all_simple_paths(G, root, node)), [])[:-1]
                     for node in df['Email']], index=df.index)
         .rename(columns=lambda x: f'Mgr_Lvl_{x+1:02d}')
      )

# join to original DataFrame
out = df.drop(columns='Email').join(df2)

输出:

          Name            Mgr_Email          Mgr_Lvl_01        Mgr_Lvl_02           Mgr_Lvl_03         Mgr_Lvl_04
0     Sally Po   [email protected]  [email protected]              None                 None               None
1     Sean Sea     [email protected]  [email protected]  [email protected]                 None               None
2    Jacob Hin    [email protected]  [email protected]  [email protected]     [email protected]  [email protected]
3    Tim Buick     [email protected]  [email protected]  [email protected]     [email protected]               None
4     Kris Olt  [email protected]  [email protected]  [email protected]  [email protected]               None
5  Cindy Myers     [email protected]  [email protected]  [email protected]                 None               None
© www.soinside.com 2019 - 2024. All rights reserved.