python pandas 根据其他列条件累积创建新的字符串列

问题描述 投票:0回答:1

假设我有一个数据集(df)

Group | Employee_Title | Employee_Name  
A     | Manager        | John     
A     | Analyst        | Adam     
A     | Analyst        | Smith    
B     | Manager        | Bill    
B     | Analyst        | Ed    
B     | Analyst        | Jay 

我想创建一个新列“Group_Manager”,以便新数据集为:

Group | Employee_Title | Employee_Name | Group_Manager 
A     | Manager        | John          | John
A     | Analyst        | Adam          | John           
A     | Analyst        | Smith         | John    
B     | Manager        | Bill          | Bill    
B     | Analyst        | Ed            | Bill       
B     | Analyst        | Jay           | Bill  

我正在寻找可以以某种“累积”方式执行此操作的Python代码,例如(现在不工作):

df['Group_Manager']=df.groupby('Group').apply(lambda Employee_Title,Employee_Name: Employee_Name if Employee_Title=="Manager" else keep previous Group_Manager)
python pandas cumsum accumulate
1个回答
0
投票

通过检索每个组的经理姓名,然后根据主数据框“组”列重新索引,您可以实现您想要的结果

import pandas as pd

# Sample data
data = {
    'Group': ['A', 'A', 'A', 'B', 'B', 'B'],
    'Employee_Title': ['Manager', 'Analyst', 'Analyst', 'Manager', 'Analyst', 'Analyst'],
    'Employee_Name': ['John', 'Adam', 'Smith', 'Bill', 'Ed', 'Jay']
}
df = pd.DataFrame(data)

# Create the Group_Manager column
df['Group_Manager'] = df.groupby('Group').apply(lambda g: g['Employee_Name'][g['Employee_Title'] == 'Manager'].iloc[0]).reindex(df['Group']).reset_index(drop=True)

print(df)

导致

  Group Employee_Title Employee_Name Group_Manager
0     A        Manager          John          John
1     A        Analyst          Adam          John
2     A        Analyst         Smith          John
3     B        Manager          Bill          Bill
4     B        Analyst            Ed          Bill
5     B        Analyst           Jay          Bill
© www.soinside.com 2019 - 2024. All rights reserved.