pandas 数据帧上的滚动平均值,其中均值中心基于另一个数据帧的时间

问题描述 投票:0回答:1

我有一个大约每 15 秒一次的 df (df1) 和另一个大约每 5 分钟一次的数据 (df2),如下所示。我想让 df2 的“时间”变量与 df1 最接近的时间戳相匹配,然后将最接近的时间戳作为 5 分钟平均 df1 数据的中心。

df1:

                  time  speed
0  2022-10-04 00:00:24  4.590
1  2022-10-04 00:00:41  4.389
2  2022-10-04 00:00:57  4.367
3  2022-10-04 00:01:14  4.539
4  2022-10-04 00:01:30  4.584
5  2022-10-04 00:01:48  4.523
6  2022-10-04 00:02:05  4.498
7  2022-10-04 00:02:21  4.625
8  2022-10-04 00:02:38  4.497
9  2022-10-04 00:02:54  4.406
10 2022-10-04 00:03:12  4.502
11 2022-10-04 00:03:28  4.494
12 2022-10-04 00:03:45  4.445
13 2022-10-04 00:04:01  4.438
14 2022-10-04 00:04:18  4.433
15 2022-10-04 00:04:36  4.441
16 2022-10-04 00:04:52  4.400
17 2022-10-04 00:05:09  4.221
18 2022-10-04 00:05:27  4.115
19 2022-10-04 00:05:43  4.009
20 2022-10-04 00:06:01  4.230
21 2022-10-04 00:06:18  4.360
22 2022-10-04 00:06:34  4.331
23 2022-10-04 00:06:51  4.178
24 2022-10-04 00:07:07  4.238
25 2022-10-04 00:07:25  4.125
26 2022-10-04 00:07:43  3.988
27 2022-10-04 00:08:17  3.573
28 2022-10-04 00:08:34  4.471
29 2022-10-04 00:08:50  4.567
30 2022-10-04 00:09:08  4.451
31 2022-10-04 00:09:25  4.311
32 2022-10-04 00:09:42  4.280
33 2022-10-04 00:09:59  4.439
34 2022-10-04 00:10:17  4.410
35 2022-10-04 00:10:35  4.335
36 2022-10-04 00:10:51  4.193
37 2022-10-04 00:11:08  4.140
38 2022-10-04 00:11:25  4.020
39 2022-10-04 00:11:43  3.872
40 2022-10-04 00:12:01  3.859
41 2022-10-04 00:12:17  4.062
42 2022-10-04 00:12:34  3.861
43 2022-10-04 00:12:51  3.780
44 2022-10-04 00:13:07  3.680
45 2022-10-04 00:13:25  3.909
46 2022-10-04 00:13:42  3.852
47 2022-10-04 00:13:58  3.867
48 2022-10-04 00:14:15  3.715
49 2022-10-04 00:14:32  3.534
50 2022-10-04 00:14:49  3.349
51 2022-10-04 00:15:06  3.213
52 2022-10-04 00:15:23  3.215
53 2022-10-04 00:15:39  3.246
54 2022-10-04 00:15:55  3.195
55 2022-10-04 00:16:14  3.164
56 2022-10-04 00:16:30  3.149
57 2022-10-04 00:16:47  3.281
58 2022-10-04 00:17:03  3.366
59 2022-10-04 00:17:20  3.295
60 2022-10-04 00:17:38  3.487
61 2022-10-04 00:17:54  3.534
62 2022-10-04 00:18:11  3.430
63 2022-10-04 00:18:27  3.474
64 2022-10-04 00:18:44  3.275
65 2022-10-04 00:19:01  3.584
66 2022-10-04 00:19:18  3.616
67 2022-10-04 00:19:34  3.506
68 2022-10-04 00:19:51  3.561
69 2022-10-04 00:20:08  3.316
70 2022-10-04 00:20:27  3.396
71 2022-10-04 00:20:43  3.536
72 2022-10-04 00:20:59  3.631
73 2022-10-04 00:21:16  3.573
74 2022-10-04 00:21:33  3.514
75 2022-10-04 00:21:50  3.603
76 2022-10-04 00:22:07  3.591
77 2022-10-04 00:22:23  3.591
78 2022-10-04 00:22:40  3.659
79 2022-10-04 00:23:14  4.056

df2:

                 time     speed
0 2022-10-03 00:03:23  4.646689
1 2022-10-03 00:08:24  5.328516
2 2022-10-03 00:13:24  5.895778
3 2022-10-03 00:18:24  5.665014
4 2022-10-03 00:22:25  6.313763

我所知道的是,我可以使用 pandas.merge_asof 将时间与“tolerance”参数尽可能接近。从那里开始,我一直在尝试 pandas.groupby 和 pandas.rolling 的几种组合,但仍在努力获得所需的结果。我不知道如何继续,因此我们将不胜感激。

python pandas datetime mean centering
1个回答
0
投票

IIUC,你可以尝试这样的事情:

df1_rolling_mean = (
    df1.rolling(window="300s", on="time", center=True)
    .mean()
    .rename(columns={"speed": "speed_avg"})
)
df1 = pd.merge(df1, df1_rolling_mean)

df = pd.merge_asof(
    df1,
    df2.rename(columns={"speed": "speed_df2"}),
    on="time",
    tolerance=pd.Timedelta(seconds=15),
)

df = df[df.loc[:, "speed_df2"].notna()]

这将导致:

                  time  speed  speed_avg  speed_df2
11 2022-10-04 00:03:28  4.494   4.421765   4.646689
28 2022-10-04 00:08:34  4.471   4.265625   5.328516
45 2022-10-04 00:13:25  3.909   3.687167   5.895778
63 2022-10-04 00:18:27  3.474   3.410000   5.665014
78 2022-10-04 00:22:40  3.659   3.615000   6.313763
  • speed_avg
    speed
    中第
    df1
    列的 5 分钟平均值;
  • 与列
    df1
    上的
    df2
    合并时,我们仅保留
    time
    merge_asof
    具有
    time
    共同值的列,容差为 15 秒。
© www.soinside.com 2019 - 2024. All rights reserved.