我尝试仅使用员工 ID 和经理电子邮件地址列来创建经理层次结构列。 Name 列具有唯一值,而 Mgr_Email 列中存在重复项(即每个员工有一名经理,但一名经理可以在其组织中拥有多个报告)。
数据是这样的
Name Mgr_Email
Sally Po [email protected]
Sean Sea [email protected]
Jacob Hin [email protected]
Tim Buick [email protected]
Kris Olt [email protected]
Cindy Myers [email protected]
期望的结果是这样的。经理层次结构可能有很多级别,而不仅仅是示例中显示的 4 个层次结构列。
Name Mgr_Email Mgr_Lvl_01 Mgr_lvl_02 Mgr_lvl_03 Mgr_lvl_04
Sally Po [email protected] [email protected]
Sean Sea [email protected] [email protected] [email protected]
Jacob Hin [email protected] [email protected] [email protected] [email protected] [email protected]
Tim Buick [email protected] [email protected] [email protected] [email protected]
Kris Olt [email protected] [email protected] [email protected] [email protected]
Cindy Myers [email protected] [email protected] [email protected]
我已经尝试过,但它不起作用
i=1
df['Level 0'] = df['Manager Email Address']
while df.notna().sum().ne(1).all():
df[f'Mgr_Lvl {i}'] = df[f'Mgr_Lvl {i-1}'].map(df.set_index('Name')['Mgr_Email'])
i+=1
df = df.drop('Level 0',axis=1)
df['Mgr_Lvl_01'] = df.loc[:,f'Mgr_Level {i-1}'].ffill().bfill()
感谢我能得到的任何帮助,谢谢。
遵循 uo 注释。下面的方法适用于小型数据集,但对于大型数据集效率不高。请参阅这个更高效的替代方案,同样基于
networkx
。
仅使用 pandas 无法轻松解决此问题,您需要将其视为图形问题。
这是你的图表:
networkx
:
# make email address from Name
# (best would be to already have an identifier to map names)
df['Email'] = df['Name'].str.lower().str.replace(r'(\w+) (\w+)', r'\1.\[email protected]', regex=True)
import networkx as nx
# create graph
G = nx.from_pandas_edgelist(df, source='Mgr_Email', target='Email',
create_using=nx.DiGraph)
# find roots (= top managers)
roots = [n for n,d in G.in_degree() if d==0]
# ['[email protected]']
# for each employee, find the hierarchy
df2 = (pd.DataFrame([next((p for root in roots for p in nx.all_simple_paths(G, root, node)), [])[:-1]
for node in df['Email']], index=df.index)
.rename(columns=lambda x: f'Mgr_Lvl_{x+1:02d}')
)
# join to original DataFrame
out = df.drop(columns='Email').join(df2)
输出:
Name Mgr_Email Mgr_Lvl_01 Mgr_Lvl_02 Mgr_Lvl_03 Mgr_Lvl_04
0 Sally Po [email protected] [email protected] None None None
1 Sean Sea [email protected] [email protected] [email protected] None None
2 Jacob Hin [email protected] [email protected] [email protected] [email protected] [email protected]
3 Tim Buick [email protected] [email protected] [email protected] [email protected] None
4 Kris Olt [email protected] [email protected] [email protected] [email protected] None
5 Cindy Myers [email protected] [email protected] [email protected] None None