仅当我的组连续出现在数据集中时,如何才将新的累积 ID 附加到我的组?

问题描述 投票:0回答:1

我有一个带有 group_ids 的数据集。当组始终出现在我的数据集中时,我想将唯一的 id 附加到组中,这意味着它们最多消失 5 秒。如果它们消失超过 5 秒,它们应该获得一个新的累积 ID。如果它们保持存在或仅消失 5 秒,则它们应该保持相同的累积数量。

这是我的数据集:

    group_id    dt2         unique_id
0   nan 2023-11-28 17:43:09.900628  1
1   1   2023-11-28 17:43:11.322793  2
2   1   2023-11-28 17:43:12.660818  2
3   1   2023-11-28 17:43:14.119043  2
4   1   2023-11-28 17:43:15.550513  2
5   2   2023-11-28 17:43:15.550513  3
6   3   2023-11-28 17:43:15.550513  4
7   4   2023-11-28 17:43:15.550513  5
8   1   2023-11-28 17:43:16.973557  6
9   2   2023-11-28 17:43:16.973557  7
10  3   2023-11-28 17:43:16.973557  8
11  4   2023-11-28 17:43:16.973557  9
12  1   2023-11-28 17:43:18.335619  10
13  2   2023-11-28 17:43:18.335619  11
14  3   2023-11-28 17:43:18.335619  12
15  4   2023-11-28 17:43:18.335619  13
16  1   2023-11-28 17:43:19.738230  14
17  2   2023-11-28 17:43:19.738230  15
18  3   2023-11-28 17:43:19.738230  16
19  4   2023-11-28 17:43:19.738230  17
20  1   2023-11-28 17:43:21.110693  18
21  2   2023-11-28 17:43:21.110693  19
22  1   2023-11-28 17:43:22.571257  20
23  2   2023-11-28 17:43:22.571257  21
24  1   2023-11-28 17:43:24.000589  22
25  1   2023-11-28 17:43:25.429940  22
26  2   2023-11-28 17:43:25.429940  23
27  1   2023-11-28 17:43:26.851142  24
28  2   2023-11-28 17:43:26.851142  25
29  1   2023-11-28 17:43:28.256274  26
30  nan 2023-11-28 17:43:29.617541  27
31  nan 2023-11-28 17:43:30.974490  27
32  nan 2023-11-28 17:43:32.360739  27
33  1   2023-11-28 17:43:33.730457  28
34  1   2023-11-28 17:43:35.270380  28

我尝试了这个 cumsum() 方法来创建我的“unique_id”列:

df['group_id'] = (df['group_id'].eq(0) | (df['group_id'] != df['group_id'].shift())).cumsum()

这是正确的前进方向,但它为我的组附加了新的累积值,即使它们是基于日期时间列存在的。有没有办法实现这样的逻辑:如果 group_id 在至少 5 秒内出现,则仅将新的累积值附加为 unique_id?

python python-3.x pandas cumsum accumulate
1个回答
0
投票
import pandas as pd


df['dt2'] = pd.to_datetime(df['dt2'], errors='raise')
df.loc[[12, 20], 'dt2'] +=pd.to_timedelta(10, unit='s')

df['delta'] = df['dt2'].diff() / np.timedelta64(1, 's')

df['test'] = (df['delta'] >= 5).cumsum()

print(df)

输出:

   group_id                        dt2  unique_id      delta  test
0       nan 2023-11-28 17:43:09.900628          1        NaN     0
1         1 2023-11-28 17:43:11.322793          2   1.422165     0
2         1 2023-11-28 17:43:12.660818          2   1.338025     0
3         1 2023-11-28 17:43:14.119043          2   1.458225     0
4         1 2023-11-28 17:43:15.550513          2   1.431470     0
5         2 2023-11-28 17:43:15.550513          3   0.000000     0
6         3 2023-11-28 17:43:15.550513          4   0.000000     0
7         4 2023-11-28 17:43:15.550513          5   0.000000     0
8         1 2023-11-28 17:43:16.973557          6   1.423044     0
9         2 2023-11-28 17:43:16.973557          7   0.000000     0
10        3 2023-11-28 17:43:16.973557          8   0.000000     0
11        4 2023-11-28 17:43:16.973557          9   0.000000     0
12        1 2023-11-28 17:43:28.335619         10  11.362062     1
13        2 2023-11-28 17:43:18.335619         11 -10.000000     1
14        3 2023-11-28 17:43:18.335619         12   0.000000     1
15        4 2023-11-28 17:43:18.335619         13   0.000000     1
16        1 2023-11-28 17:43:19.738230         14   1.402611     1
17        2 2023-11-28 17:43:19.738230         15   0.000000     1
18        3 2023-11-28 17:43:19.738230         16   0.000000     1
19        4 2023-11-28 17:43:19.738230         17   0.000000     1
20        1 2023-11-28 17:43:31.110693         18  11.372463     2
21        2 2023-11-28 17:43:21.110693         19 -10.000000     2
22        1 2023-11-28 17:43:22.571257         20   1.460564     2
23        2 2023-11-28 17:43:22.571257         21   0.000000     2
24        1 2023-11-28 17:43:24.000589         22   1.429332     2
25        1 2023-11-28 17:43:25.429940         22   1.429351     2
26        2 2023-11-28 17:43:25.429940         23   0.000000     2
27        1 2023-11-28 17:43:26.851142         24   1.421202     2
28        2 2023-11-28 17:43:26.851142         25   0.000000     2
29        1 2023-11-28 17:43:28.256274         26   1.405132     2
30      nan 2023-11-28 17:43:29.617541         27   1.361267     2
31      nan 2023-11-28 17:43:30.974490         27   1.356949     2
32      nan 2023-11-28 17:43:32.360739         27   1.386249     2
33        1 2023-11-28 17:43:33.730457         28   1.369718     2
34        1 2023-11-28 17:43:35.270380         28   1.539923     2
© www.soinside.com 2019 - 2024. All rights reserved.