假设我有一个数据集(df)
Group | Employee_Title | Employee_Name
A | Manager | John
A | Analyst | Adam
A | Analyst | Smith
B | Manager | Bill
B | Analyst | Ed
B | Analyst | Jay
我想创建一个新列“Group_Manager”,以便新数据集为:
Group | Employee_Title | Employee_Name | Group_Manager
A | Manager | John | John
A | Analyst | Adam | John
A | Analyst | Smith | John
B | Manager | Bill | Bill
B | Analyst | Ed | Bill
B | Analyst | Jay | Bill
我正在寻找可以以某种“累积”方式执行此操作的Python代码,例如(现在不工作):
df['Group_Manager']=df.groupby('Group').apply(lambda Employee_Title,Employee_Name: Employee_Name if Employee_Title=="Manager" else keep previous Group_Manager)
通过检索每个组的经理姓名,然后根据主数据框“组”列重新索引,您可以实现您想要的结果
import pandas as pd
# Sample data
data = {
'Group': ['A', 'A', 'A', 'B', 'B', 'B'],
'Employee_Title': ['Manager', 'Analyst', 'Analyst', 'Manager', 'Analyst', 'Analyst'],
'Employee_Name': ['John', 'Adam', 'Smith', 'Bill', 'Ed', 'Jay']
}
df = pd.DataFrame(data)
# Create the Group_Manager column
df['Group_Manager'] = df.groupby('Group').apply(lambda g: g['Employee_Name'][g['Employee_Title'] == 'Manager'].iloc[0]).reindex(df['Group']).reset_index(drop=True)
print(df)
导致
Group Employee_Title Employee_Name Group_Manager
0 A Manager John John
1 A Analyst Adam John
2 A Analyst Smith John
3 B Manager Bill Bill
4 B Analyst Ed Bill
5 B Analyst Jay Bill