将时间序列附加到数据框

问题描述 投票:0回答:1

我有一个如下所示的数据框:

"2023-09-07 13:22" type1 12.7
"2023-09-07 14:07" type2 101.1 

并分别为每种类型提供一个具有 reg 间隔时间序列的数据框:

                   type1     type2
2023-09-07 08:00       1         2
2023-09-07 08:15       3         4
2023-09-07 08:30       5         6
...
2023-09-07 13:15       7         8
2023-09-07 13:30       9        10      
2023-09-07 13:45      11        12
2023-09-07 14:00      13        14
2023-09-07 14:15      15        16
2023-09-07 14:30      17        18
...

我想将(作为一行)附加到第一个数据帧中的每一行,从给定后的第一个时间戳开始,第二个数据帧中的 2 个(或 N 个)值。

所以在这种情况下,答案是

"2023-09-07 13:22" type1 12.7    9 11
"2023-09-07 14:07" type2 101.1  16 18

我可以循环第一个数据帧中的行,每次在第二个数据帧中找到一个切片,但这非常慢。想知道是否有更好的解决方案。看起来是一个很常见的任务。

谢谢你。

生成输入数据帧的代码:

df1 = pd.DataFrame(columns = ["date", "type", "val"])
df1.loc[0] = [pd.to_datetime("2023-09-07 13:22:00"), "type1", 12.1]
df1.loc[1] = [pd.to_datetime("2023-09-07 14:07:00"), "type2", 101.1]
df1 = df1.set_index("date")
df2 = pd.DataFrame()
df2["date"] = pd.to_datetime(["2023-09-07 08:00", "2023-09-07 08:15","2023-09-07 08:30", "2023-09-07 13:15","2023-09-07 13:30", "2023-09-07 13:45","2023-09-07 14:00", "2023-09-07 14:15","2023-09-07 14:30"])
df2["type1"] = [1,3,5,7,9,11,13,15,17]
df2["type2"] = [2,4,6,8,10,12,14,16,18]
python pandas merge
1个回答
1
投票

您可以尝试

pd.merge_asof
+切片作为下一步:

输入数据帧(按索引排序):

df1

                      type  value
time                             
2023-09-07 13:22:00  type1   12.7
2023-09-07 14:07:00  type2  101.1

df2

                     type1  type2
time                             
2023-09-07 08:00:00      1      2
2023-09-07 08:15:00      3      4
2023-09-07 08:30:00      5      6
2023-09-07 13:15:00      7      8
2023-09-07 13:30:00      9     10
2023-09-07 13:45:00     11     12
2023-09-07 14:00:00     13     14
2023-09-07 14:15:00     15     16
2023-09-07 14:30:00     17     18

N = 2

df2["time_tmp"] = df2.index
tmp = pd.merge_asof(df1, df2, left_index=True, right_index=True, direction="forward")
df1[list(range(N))] = tmp.apply(
    lambda x: df2.loc[x["time_tmp"] :, x["type"]][:N].values,
    axis=1,
    result_type="expand",
)
print(df1)

打印:

                      type  value   0   1
time                                     
2023-09-07 13:22:00  type1   12.7   9  11
2023-09-07 14:07:00  type2  101.1  16  18

编辑:“更安全”版本,使用

np.pad
(如果行中的值少于 N):

def fn(row):
    vals = df2.loc[row["time_tmp"] :, row["type"]][:N].values
    if len(vals) < N:
        vals = np.pad(
            vals, mode="constant", pad_width=(0, N - len(vals)), constant_values=-1
        )
    return vals


df2["time_tmp"] = df2.index
tmp = pd.merge_asof(df1, df2, left_index=True, right_index=True, direction="forward")
df1[list(range(N))] = tmp.apply(
    fn,
    axis=1,
    result_type="expand",
)
print(df1)
© www.soinside.com 2019 - 2024. All rights reserved.