例如DF,其中包含跨时间戳的执行次数。
DateTime Execution
0 2023-04-03 07:00:00 11
1 2023-04-03 11:00:00 1
2 2023-04-03 12:00:00 1
3 2023-04-03 14:00:00 3
4 2023-04-03 18:00:00 1
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5080 entries, 0 to 5079
下面是我想要实现的输出
DateTime Execution
0 2023-04-03 07:00:00 4
1 2023-04-03 08:00:00 4
2 2023-04-03 09:00:00 3
3 2023-04-03 11:00:00 1
4 2023-04-03 12:00:00 1
5 2023-04-03 14:00:00 3
6 2023-04-03 18:00:00 1
只有执行超过4次,才应该分配到接下来的几个小时。任何小时最多为 4。
再次感谢您的快速帮助。
这有助于均匀分布,我正在考虑不均匀分布。
asfreq
/clip
:
N = 4
asfreq = df.set_index("DateTime").asfreq("h")
out = (
(grp:=asfreq.groupby(asfreq["Execution"].notna().cumsum()))
["Execution"]
.transform("last")
.sub(grp.cumcount() * N)
.clip(upper=N)
.loc[lambda s: s.gt(0)]
.reset_index(name="Exection")
.convert_dtypes()
)
输出:
DateTime Exection
0 2023-04-03 07:00:00 4
1 2023-04-03 08:00:00 4
2 2023-04-03 09:00:00 3
3 2023-04-03 11:00:00 1
4 2023-04-03 12:00:00 1
5 2023-04-03 14:00:00 3
6 2023-04-03 18:00:00 1
创建一个新的 DataFrame,其中每行根据 Execution 列值重复,然后按 bour 进行分组,并将每小时的最大执行次数限制为 4:
import pandas as pd
import numpy as np
df['DateTime'] = pd.to_datetime(df['DateTime'])
df = df.loc[np.repeat(df.index.values, df['Execution'])]
df.reset_index(drop=True, inplace=True)
df['DateTime'] = df.groupby((df['Execution'].cumsum() - 1) // 4)['DateTime'].transform('min') + pd.to_timedelta((df['Execution'].cumsum() - 1) // 4, unit='h')
df['Execution'] = np.where(df['Execution'] > 4, 4, df['Execution'])
df = df.groupby('DateTime', as_index=False)['Execution'].sum()
print(df)