我有一种情况,需要计算X年公司的雇员任期。数据以以下格式提供:
df =
EID Event_Name Event_Date
1 Hired 03/09/1990 00:00:00
1 Terminated 14/10/2005 00:00:00
1 Rehired 02/11/2015 00:00:00
2 Hired 03/10/1990 00:00:00
2 Terminated 15/10/2005 00:00:00
权数计算很容易理解:
首先计算雇用和终止之间的差异秒计算今天的日期和重新雇用之间的差额(如果重新雇用则适用其他明智的忽略方法)
示例:对于EID = 1:任期为:
(LAST JOB WORKED WITH COMPANY X) 14/10/2005 00:00:00 - 03/09/1990 00:00:00 = 5520 days
(REHIRED WITH COMP X AND STILL WORKING) 25/04/2020 00:00:00 - 02/11/2015 00:00:00 = 1636 days
总天数=(5520 + 1636)/ 365 = 19.6年同样,对于其他EID也是如此
输出应如下所示:
EID Tenure(Years)
1 19.6
2 15.04
如果Terminated
列中每组从不连续的Event_Name
值,则解决方案有效:
#converting to datetimes
df['Event_Date'] = pd.to_datetime(df['Event_Date'], dayfirst=True)
#today datetime
now = pd.Timestamp.now().floor('d')
#shifted rows per groups for terminated Event_Date
df['new'] = df.groupby('EID')['Event_Date'].shift(-1).fillna(now)
#removed rows with Terminated
df = df[df['Event_Name'].ne('Terminated')].copy()
#difference
df['Tenure(Years)'] = df['new'].sub(df['Event_Date'])
#aggregate sum
df = df.groupby('EID')['Tenure(Years)'].sum().dt.days.div(365).reset_index()
print (df)
EID Tenure(Years)
0 1 19.605479
1 2 15.043836