在pandas中,groupby计算唯一的日期时间。

问题描述 投票:0回答:1

我有一个数据框架,如下图所示

Doctor   Start                B_ID  Session      Finish                 NoShow
    A   2020-01-18 12:00:00     1    S1         2020-01-18 12:33:00     no
    A   2020-01-18 12:20:00     2    S1         2020-01-18 12:52:00     no
    A   2020-01-18 13:00:00     3    S1         2020-01-18 13:23:00     no
    A   2020-01-18 13:00:00     4    S1         2020-01-18 13:37:00     yes
    A   2020-01-18 13:35:00     5    S1         2020-01-18 13:56:00     no
    A   2020-01-18 14:10:00     6    S1         2020-01-18 14:15:00     no
    A   2020-01-18 14:10:00     7    S1         2020-01-18 14:28:00     yes
    A   2020-01-18 14:10:00     8    S1         2020-01-18 14:40:00     yes
    A   2020-01-18 14:10:00     9    S1         2020-01-18 15:01:00     no
    A   2020-01-19 12:00:00    12    S2         2020-01-19 12:20:00     no
    A   2020-01-19 12:30:00    13    S2         2020-01-19 12:40:00     no
    A   2020-01-19 13:00:00    14    S2         2020-01-19 13:20:00     yes
    A   2020-01-19 13:40:00    15    S2         2020-01-19 13:46:00     no
    A   2020-01-19 14:00:00    16    S2         2020-01-19 14:10:00     yes
    A   2020-01-19 14:00:00    17    S2         2020-01-19 14:20:00     no
    A   2020-01-19 14:00:00    19    S2         2020-01-19 14:40:00     yes
    B   2020-01-18 12:00:00    21    S3         2020-01-18 12:33:00     no
    B   2020-01-18 12:30:00    22    S3         2020-01-18 12:52:00     no
    B   2020-01-18 13:10:00    23    S3         2020-01-18 13:25:00     no
    B   2020-01-18 13:10:00    24    S3         2020-01-18 13:39:00     no
    B   2020-01-18 13:30:00    25    S3         2020-01-18 13:56:00     yes
    B   2020-01-18 14:05:00    26    S3         2020-01-18 14:15:00     no
    B   2020-01-18 14:30:00    27    S3         2020-01-18 14:48:00     yes

根据以上内容,我想准备以下的数据框架。

预期产出。

Doctor        Day       No_of_slots    No_of_bookings    No_of_NoShow
A         2020-01-18        5               9               3    
A         2020-01-19        5               7               3  
b         2020-01-18        6               7               2

其中

 No_of_slots = Total number of slots based on unique Start time

 No_of_bookings = Total number of bookings 

 No_of_NoShow = Number of NoShow == 'yes'
pandas pandas-groupby
1个回答
2
投票

使用 GroupBy.agg 与命名的聚合,对于计数 yes 价值是用来 sum 旁栏 new 所创 DataFrame.assign 比比 Series.eq 并将其转换为数字 Series.view:

df['Start'] = pd.to_datetime(df['Start'])
df['Finish'] = pd.to_datetime(df['Finish'])

d = df['Start'].dt.date.rename('Day')
df1 = (df.assign(new = df['NoShow'].eq('yes').view('i1'))
         .groupby(['Doctor', d]).agg(No_of_slots=('Start','nunique'),
                                     No_of_bookings=('Start','size'),
                                     No_of_NoShow=('new', 'sum'))
        .reset_index())
print (df1)
  Doctor         Day  No_of_slots  No_of_bookings  No_of_NoShow
0      A  2020-01-18            5               9             3
1      A  2020-01-19            5               7             3
2      B  2020-01-18            6               7             2
© www.soinside.com 2019 - 2024. All rights reserved.