在下面的数据框中,
patient admit discharge 30d_admit
0 a 2022-03-01 2022-03-01 2022-01-31
1 a 2022-02-02 2022-02-02 2022-01-04
2 a 2022-01-28 2022-01-28 2021-12-30
3 b 2022-12-04 2022-12-04 2022-11-05
4 c 2022-11-13 2022-11-13 2022-10-15
5 c 2022-10-23 2022-10-23 2022-09-24
我想检查患者“a”、“b”、“c”的出院日期是否在 30d_admit 和入院日期之间,那么计数器应设置为高值,例如:
patient admit discharge 30d_admit count
0 a 2022-03-01 2022-03-01 2022-01-31 1
1 a 2022-02-02 2022-02-02 2022-01-04 1
2 a 2022-01-28 2022-01-28 2021-12-30 0
3 b 2022-12-04 2022-12-04 2022-11-05 0
4 c 2022-11-13 2022-11-13 2022-10-15 1
5 c 2022-10-23 2022-10-23 2022-09-24 0
我想要统计患者 a、b、c 之间特定入院的出院日期 30d_录取和录取日期
for patient a
admit discharge 30d_admit
first check 2022-03-01 > 2022-03-01 > 2022-01-31 count=0
second check 2022-03-01 > 2022-02-02 > 2022-01-31 count=1
third check 2022-03-01 > 2022-01-28 > 2022-01-31 count=0
2022-03-01 录取总数 = 1 同样需要检查分组数据中的其他录取日期。
请提供以上问题的解决方案
在分组数据上尝试了 iterrows() ..请提供有关如何在分组数据框中使用 iterrows 的解决方案
使用您提供的数据框:
import pandas as pd
df = pd.DataFrame(
{
"patient": ["a", "a", "a", "b", "c", "c"],
"admit": [
"2022-03-01",
"2022-02-02",
"2022-01-28",
"2022-12-04",
"2022-11-13",
"2022-10-23",
],
"discharge": [
"2022-03-01",
"2022-02-02",
"2022-01-28",
"2022-12-04",
"2022-11-13",
"2022-10-23",
],
}
)
这是使用 Pandas to_datetime、concat、assign、groupby 和 Timedelta 实现此目的的一种方法:
for col in ("admit", "discharge"):
df[col] = pd.to_datetime(df[col], format="%Y-%m-%d")
new_df = pd.concat(
[
df.assign(
time_period=f"{time_period[0].strftime('%Y-%m-%d')} "
f"to {time_period[1].strftime('%Y-%m-%d')}",
count=(df["discharge"] > time_period[0])
& (df["discharge"] < time_period[1]),
)
.groupby(["time_period", "patient"])
.agg({"count": "sum"})
for time_period in [
(admit_date - pd.Timedelta("30D"), admit_date)
for admit_date in df["admit"].unique()
]
]
)
然后:
print(new_df)
# Output
count
time_period patient
2022-01-30 to 2022-03-01 a 1
b 0
c 0
2022-01-03 to 2022-02-02 a 1
b 0
c 0
2021-12-29 to 2022-01-28 a 0
b 0
c 0
2022-11-04 to 2022-12-04 a 0
b 0
c 1
2022-10-14 to 2022-11-13 a 0
b 0
c 1
2022-09-23 to 2022-10-23 a 0
b 0
c 0