根据一个日期列相对于另一个日期列在 pandas 数据框中创建订单列

问题描述 投票:0回答:1

我有一段摘录,需要我识别某种类型的手术

X
,请参见
Surg Type
列。

我需要保留在一个窗口/时间段内被视为不同行的医疗预约,其中 3 个预约是之前的 (-3、-2、-1) 和 3 个预约是后的 (+1、+2、+3)

我必须将此订单作为附加栏包含在内。

除此之外,我需要排除窗口外的任何预约和任何其他类型的

Surg Type
,在此示例中,任何手术都表示为 Z。

在此示例中,我想要保留 7/9 行/记录和一个附加列

Prior Post

*** 更新示例***

Original Df

| Patient ID | Surg ID | Surg Type | Surg Date  | Medical Appt | Medical Appt Date |
|------------|---------|-----------|------------|--------------|-------------------|
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-01-01        |
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-03-04        |
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-05-04        |
| 1          | 1       | X         | 2022-09-03 | N            | NaT               |
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-11-04        |
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-11-29        |
| 1          | 2       | Z         | 2022-12-01 | N            | NaT               |
| 1          | 1       | X         | 2022-09-03 | Y            | 2023-01-02        |
| 1          | 1       | X         | 2022-09-03 | Y            | 2023-01-13        |



Desired Df

| Patient ID | Surg ID | Surg Type | Surg Date  | Medical Appt | Medical Appt Date | Inclusion   |
|------------|---------|-----------|------------|--------------|-------------------|-------------|
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-01-01        | -3          |
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-03-04        | -2          |
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-05-04        | -1          |
| 1          | 1       | X         | 2022-09-03 | N            | NaT               |             |
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-11-04        | +1          |
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-11-29        | +2          |
| 1          | 2       | Z         | 2022-12-01 | N            | NaT               | Exclude Row |
| 1          | 1       | X         | 2022-09-03 | Y            | 2023-01-02        | +3          |
| 1          | 1       | X         | 2022-09-03 | Y            | 2023-01-13        | Exclude row |
pandas rank
1个回答
0
投票

您可以过滤手术

X
,然后在排序的日期上计算
rolling.max
,以保留每次手术周围的 ±
N
日期(假设手术是
NaT
中带有
Medical Appt Date
的行):

# number of medical appointments to keep before/after a surgery
N = 3

# columns to use a grouper
group_cols = ['Patient ID', 'Surg ID']

# ensure datetime
df[['Surg Date', 'Medical Appt Date']] = df[['Surg Date', 'Medical Appt Date']].apply(pd.to_datetime)

# filter out the non-X types
# sort by date, compute a groupby.rolling.max
# identify the rows to keep
keep = (
 df[df['Surg Type'].eq('X')]
 .assign(date=lambda d: d['Medical Appt Date'].fillna(d['Surg Date']),
         surgery=lambda d: d['Medical Appt Date'].isna()
        ) 
 .sort_values(by=group_cols+['date'])
 .groupby(group_cols, sort=False)
 ['surgery'].rolling(2*N+1, center=True, min_periods=1)
 .max().astype(bool)
 .droplevel(group_cols)
)

# select the rows from the above list of indices to keep
out = df.loc[keep.index[keep]]

输出:

   Patient ID  Surg ID Surg Type  Surg Date Medical Appt Medical Appt Date
0           1        1         X 2022-09-03            Y        2022-01-01
1           1        1         X 2022-09-03            Y        2022-03-04
2           1        1         X 2022-09-03            Y        2022-05-04
3           1        1         X 2022-09-03            N               NaT
4           1        1         X 2022-09-03            Y        2022-11-04
5           1        1         X 2022-09-03            Y        2022-11-29
7           1        1         X 2022-09-03            Y        2023-01-02
© www.soinside.com 2019 - 2024. All rights reserved.