我具有如下所示的数据框。
Doctor Appointment Booking_ID
A 2020-01-18 12:00:00 1
A 2020-01-18 12:30:00 2
A 2020-01-18 13:00:00 3
A 2020-01-18 13:00:00 4
A 2020-01-19 13:00:00 13
A 2020-01-19 13:30:00 14
B 2020-01-18 12:00:00 5
B 2020-01-18 12:30:00 6
B 2020-01-18 13:00:00 7
B 2020-01-25 12:30:00 6
B 2020-01-25 13:00:00 7
C 2020-01-19 12:00:00 19
C 2020-01-19 12:30:00 20
C 2020-01-19 13:00:00 21
C 2020-01-22 12:30:00 20
C 2020-01-22 13:00:00 21
从上面,我想创建一个称为Session的列,如下所示。
预期输出:
Doctor Appointment Booking_ID Session
A 2020-01-18 12:00:00 1 S1
A 2020-01-18 12:30:00 2 S1
A 2020-01-18 13:00:00 3 S1
A 2020-01-18 13:00:00 4 S1
A 2020-01-29 13:00:00 13 S2
A 2020-01-29 13:30:00 14 S2
B 2020-01-18 12:00:00 5 S3
B 2020-01-18 12:30:00 6 S3
B 2020-01-18 13:00:00 17 S3
B 2020-01-25 12:30:00 16 S4
B 2020-01-25 13:00:00 7 S4
C 2020-01-19 12:00:00 19 S5
C 2020-01-19 12:30:00 20 S5
C 2020-01-19 13:00:00 21 S5
C 2020-01-22 12:30:00 29 S6
C 2020-01-22 13:00:00 26 S6
C 2020-01-22 13:30:00 24 S6
会话应针对不同的医生和不同的约会日期(以天为单位)
我在下面尝试过
df = df.sort_values(['Doctor', 'Appointment'], ascending=True)
df['Appointment'] = pd.to_datetime(df['Appointment'])
dates = df['Appointment'].dt.date
df['Session'] = 'S' + pd.Series(dates.factorize()[0] + 1, index=df.index).astype(str)
但是它正在考虑仅基于日期的会话。我也想考虑医生。
df['Session'] = 'S' + (df.groupby(['Doctor',pd.to_datetime(df['Appointment']).dt.date])
.ngroup()
.add(1).astype(str))
: Doctor Appointment Booking_ID Session
0 A 2020-01-18-12:00:00 1 S1
1 A 2020-01-18-12:30:00 2 S1
2 A 2020-01-18-13:00:00 3 S1
3 A 2020-01-18-13:00:00 4 S1
4 A 2020-01-19-13:00:00 13 S2
5 A 2020-01-19-13:30:00 14 S2
6 B 2020-01-18-12:00:00 5 S3
7 B 2020-01-18-12:30:00 6 S3
8 B 2020-01-18-13:00:00 7 S3
9 B 2020-01-25-12:30:00 6 S4
10 B 2020-01-25-13:00:00 7 S4
11 C 2020-01-19-12:00:00 19 S5
12 C 2020-01-19-12:30:00 20 S5
13 C 2020-01-19-13:00:00 21 S5
14 C 2020-01-22-12:30:00 20 S6
15 C 2020-01-22-13:00:00 21 S6
输出:
groupby().numgroup()
# convert to datetime
df.Appointment = pd.to_datetime(df.Appointment)
df['Session'] = 'S' + (df.groupby(['Doctor', df.Appointment.dt.date]).ngroup()+1).astype(str)
的另一种方法: Doctor Appointment Booking_ID Session
0 A 2020-01-18 12:00:00 1 S1
1 A 2020-01-18 12:30:00 2 S1
2 A 2020-01-18 13:00:00 3 S1
3 A 2020-01-18 13:00:00 4 S1
4 A 2020-01-19 13:00:00 13 S2
5 A 2020-01-19 13:30:00 14 S2
6 B 2020-01-18 12:00:00 5 S3
7 B 2020-01-18 12:30:00 6 S3
8 B 2020-01-18 13:00:00 7 S3
9 B 2020-01-25 12:30:00 6 S4
10 B 2020-01-25 13:00:00 7 S4
11 C 2020-01-19 12:00:00 19 S5
12 C 2020-01-19 12:30:00 20 S5
13 C 2020-01-19 13:00:00 21 S5
14 C 2020-01-22 12:30:00 20 S6
15 C 2020-01-22 13:00:00 21 S6