如何根据之前在 pandas DataFrame 中的预约来计算医疗预约的未出现率?”

问题描述 投票:0回答:0

我正在处理来自 Kaggle (https://www.kaggle.com/joniarroba/noshowappointments) 的医疗保健数据集,其中包含有关巴西医疗预约以及患者是否出现的信息。该数据集包含预约 ID、患者 ID、预约日期和时间、预定日期和时间以及其他几个特征的列。

我想在 DataFrame 中添加一列,根据患者之前的预约显示每次预约的未出现率。例如,如果一位患者进行了三次预约并出现了其中两次,那么他们第四次预约的未出现率将为 1/3。如果患者是第一次预约,则未出现率将为 0.

    # convert appointment and scheduled dates to datetime format
    df['AppointmentDay'] = pd.to_datetime(df['AppointmentDay'])
    df['ScheduledDay'] = pd.to_datetime(df['ScheduledDay'])

    # create a new column with the time difference between scheduled and 
    appointment date
    df['time_diff'] = (df['AppointmentDay'].dt.date - 
    df['ScheduledDay'].dt.date).dt.days

   # group by PatientId and calculate no-show count and appointment count 
   for each group
   grouped = df.groupby('PatientId')['No-show'].apply(lambda x: 
   x.eq('Yes').cumsum().shift().fillna(0))
   df['no_show_count'] = grouped
   df['appointment_count'] = grouped + df.groupby('PatientId').cumcount()

   # calculate no-show rate for each patient
   df['no_show_rate'] = df['no_show_count'] / df['appointment_count']

   # replace NaN values in 'no_show_rate' column with 0
   df['no_show_rate'] = df['no_show_rate'].fillna(0)

    # print first 5 rows
    print(df.head())

    the problem is in this code is it calculate current appointment. for 
    example  if you 
     df[df['PatientId'] == 
     112397157856688.0].sort_values('AppointmentDay') ,
    you will understand 
    better what i mean
python pandas dataframe data-analysis kaggle
© www.soinside.com 2019 - 2024. All rights reserved.