比较分组数据中的日期列

问题描述 投票:0回答:1

在下面的数据框中,

       patient      admit       discharge     30d_admit
   0       a       2022-03-01   2022-03-01   2022-01-31
   1       a       2022-02-02   2022-02-02   2022-01-04
   2       a       2022-01-28   2022-01-28   2021-12-30
   3       b       2022-12-04   2022-12-04   2022-11-05
   4       c       2022-11-13   2022-11-13   2022-10-15
   5       c       2022-10-23   2022-10-23   2022-09-24

我想检查患者“a”、“b”、“c”的出院日期是否在 30d_admit 和入院日期之间,那么计数器应设置为高值,例如:

         patient      admit     discharge  30d_admit     count
   0        a          2022-03-01 2022-03-01 2022-01-31     1
   1        a          2022-02-02 2022-02-02 2022-01-04     1
   2        a          2022-01-28 2022-01-28 2021-12-30     0
   3        b          2022-12-04 2022-12-04 2022-11-05     0
   4        c          2022-11-13 2022-11-13 2022-10-15     1
   5        c          2022-10-23 2022-10-23 2022-09-24     0

我想要统计患者 a、b、c 之间特定入院的出院日期 30d_录取和录取日期


    for patient a  
                         admit        discharge     30d_admit    
     first check       2022-03-01 > 2022-03-01 > 2022-01-31   count=0
     second check      2022-03-01 > 2022-02-02 > 2022-01-31   count=1
     third check       2022-03-01 > 2022-01-28 > 2022-01-31   count=0


2022-03-01 录取总数 = 1 同样需要检查分组数据中的其他录取日期。

请提供以上问题的解决方案

在分组数据上尝试了 iterrows() ..请提供有关如何在分组数据框中使用 iterrows 的解决方案

pandas iteration
1个回答
0
投票

使用您提供的数据框:

import pandas as pd

df = pd.DataFrame(
    {
        "patient": ["a", "a", "a", "b", "c", "c"],
        "admit": [
            "2022-03-01",
            "2022-02-02",
            "2022-01-28",
            "2022-12-04",
            "2022-11-13",
            "2022-10-23",
        ],
        "discharge": [
            "2022-03-01",
            "2022-02-02",
            "2022-01-28",
            "2022-12-04",
            "2022-11-13",
            "2022-10-23",
        ],
    }
)

这是使用 Pandas to_datetimeconcatassigngroupbyTimedelta 实现此目的的一种方法:

for col in ("admit", "discharge"):
    df[col] = pd.to_datetime(df[col], format="%Y-%m-%d")


new_df = pd.concat(
    [
        df.assign(
            time_period=f"{time_period[0].strftime('%Y-%m-%d')} "
            f"to {time_period[1].strftime('%Y-%m-%d')}",
            count=(df["discharge"] > time_period[0])
            & (df["discharge"] < time_period[1]),
        )
        .groupby(["time_period", "patient"])
        .agg({"count": "sum"})
        for time_period in [
            (admit_date - pd.Timedelta("30D"), admit_date)
            for admit_date in df["admit"].unique()
        ]
    ]
)

然后:

print(new_df)
# Output

                                  count
time_period              patient       
2022-01-30 to 2022-03-01 a            1
                         b            0
                         c            0
2022-01-03 to 2022-02-02 a            1
                         b            0
                         c            0
2021-12-29 to 2022-01-28 a            0
                         b            0
                         c            0
2022-11-04 to 2022-12-04 a            0
                         b            0
                         c            1
2022-10-14 to 2022-11-13 a            0
                         b            0
                         c            1
2022-09-23 to 2022-10-23 a            0
                         b            0
                         c            0
© www.soinside.com 2019 - 2024. All rights reserved.