基于大熊猫在日期级别的分组日期时间列创建一个新列

Question

我具有如下所示的数据框。

Doctor       Appointment           Booking_ID   
  A          2020-01-18 12:00:00     1 
  A          2020-01-18 12:30:00     2
  A          2020-01-18 13:00:00     3 
  A          2020-01-18 13:00:00     4
  A          2020-01-19 13:00:00     13
  A          2020-01-19 13:30:00     14 
  B          2020-01-18 12:00:00     5 
  B          2020-01-18 12:30:00     6 
  B          2020-01-18 13:00:00     7
  B          2020-01-25 12:30:00     6 
  B          2020-01-25 13:00:00     7
  C          2020-01-19 12:00:00     19 
  C          2020-01-19 12:30:00     20
  C          2020-01-19 13:00:00     21
  C          2020-01-22 12:30:00     20
  C          2020-01-22 13:00:00     21

从上面，我想创建一个称为Session的列，如下所示。

预期输出：

Doctor       Appointment           Booking_ID   Session
  A          2020-01-18 12:00:00     1          S1
  A          2020-01-18 12:30:00     2          S1
  A          2020-01-18 13:00:00     3          S1
  A          2020-01-18 13:00:00     4          S1
  A          2020-01-29 13:00:00     13         S2
  A          2020-01-29 13:30:00     14         S2
  B          2020-01-18 12:00:00     5          S3
  B          2020-01-18 12:30:00     6          S3
  B          2020-01-18 13:00:00     17         S3
  B          2020-01-25 12:30:00     16         S4
  B          2020-01-25 13:00:00     7          S4
  C          2020-01-19 12:00:00     19         S5
  C          2020-01-19 12:30:00     20         S5
  C          2020-01-19 13:00:00     21         S5
  C          2020-01-22 12:30:00     29         S6
  C          2020-01-22 13:00:00     26         S6
  C          2020-01-22 13:30:00     24         S6

会话应针对不同的医生和不同的约会日期（以天为单位）

我在下面尝试过

df = df.sort_values(['Doctor', 'Appointment'], ascending=True)


df['Appointment'] = pd.to_datetime(df['Appointment'])
dates = df['Appointment'].dt.date

df['Session'] = 'S' + pd.Series(dates.factorize()[0] + 1, index=df.index).astype(str)

但是它正在考虑仅基于日期的会话。我也想考虑医生。

Answer 1

IIUC，Groupby.ngroup和Groupby.ngroup

Series.dt.date


Series.dt.date

Answer 2

IIUC，这是

df['Session'] = 'S' + (df.groupby(['Doctor',pd.to_datetime(df['Appointment']).dt.date])
                         .ngroup()
                         .add(1).astype(str))

：

Doctor Appointment Booking_ID Session 0 A 2020-01-18-12:00:00 1 S1 1 A 2020-01-18-12:30:00 2 S1 2 A 2020-01-18-13:00:00 3 S1 3 A 2020-01-18-13:00:00 4 S1 4 A 2020-01-19-13:00:00 13 S2 5 A 2020-01-19-13:30:00 14 S2 6 B 2020-01-18-12:00:00 5 S3 7 B 2020-01-18-12:30:00 6 S3 8 B 2020-01-18-13:00:00 7 S3 9 B 2020-01-25-12:30:00 6 S4 10 B 2020-01-25-13:00:00 7 S4 11 C 2020-01-19-12:00:00 19 S5 12 C 2020-01-19-12:30:00 20 S5 13 C 2020-01-19-13:00:00 21 S5 14 C 2020-01-22-12:30:00 20 S6 15 C 2020-01-22-13:00:00 21 S6

输出：groupby().numgroup()

Answer 3

使用

# convert to datetime
df.Appointment = pd.to_datetime(df.Appointment)

df['Session'] = 'S' + (df.groupby(['Doctor', df.Appointment.dt.date]).ngroup()+1).astype(str)

的另一种方法：

Doctor Appointment Booking_ID Session 0 A 2020-01-18 12:00:00 1 S1 1 A 2020-01-18 12:30:00 2 S1 2 A 2020-01-18 13:00:00 3 S1 3 A 2020-01-18 13:00:00 4 S1 4 A 2020-01-19 13:00:00 13 S2 5 A 2020-01-19 13:30:00 14 S2 6 B 2020-01-18 12:00:00 5 S3 7 B 2020-01-18 12:30:00 6 S3 8 B 2020-01-18 13:00:00 7 S3 9 B 2020-01-25 12:30:00 6 S4 10 B 2020-01-25 13:00:00 7 S4 11 C 2020-01-19 12:00:00 19 S5 12 C 2020-01-19 12:30:00 20 S5 13 C 2020-01-19 13:00:00 21 S5 14 C 2020-01-22 12:30:00 20 S6 15 C 2020-01-22 13:00:00 21 S6

基于大熊猫在日期级别的分组日期时间列创建一个新列

问题描述投票：0回答：3

3个回答

最新问题

基于大熊猫在日期级别的分组日期时间列创建一个新列

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3