也许pandas.DataFrame.rolling不是执行此操作的最佳方法,请告诉我是否有更好的方法。
我想要的是在df上具有滚动窗口,并使窗口中df的所有列可用以进行各种计算。
我相信下面的代码非常接近我的目标,但是我很难理解代码中所述的索引问题。
首先x.index = RangeIndex(start = 0,stop = 2,step = 1),并且tmp_df正确选择df中的第一行和第二行(索引0和1)。对于最后一个x.index = RangeIndex(start = 4,stop = 6,step = 1),似乎iloc尝试在df中选择超出范围的索引6(df的索引为0至5)。
我想念什么?
谢谢您的任何建议。
import numpy as np
import pandas as pd
df = pd.DataFrame({'open': [7, 5, 10, 11,6,12],
'close': [6, 6, 11, 10,7,10],
'positive': [0, 1, 1, 0,1,0]},
)
def do_calculations_on_any_df_column_in_window(x,df):
print("index:",x.index)
tmp_df = df.iloc[x.index] # raises "ValueError: cannot set using a slice indexer with a different length than the value" when x.index = RangeIndex(start=4, stop=6, step=1) as df index goes from 0 to 5 only
# do calulations on any column in tmp_df, get result
result = 1 #dummyresult
return result
intervals = range(2, 10)
for i in intervals:
df['result_' + str(i)] = np.nan
res = df.rolling(i).apply(do_calculations_on_any_df_column_in_window, args=(df,), raw=False)
df['result_' + str(i)][1:] = res
print(df)
尝试此功能:
def calculate_on_rolling_window(df, win, col_names):
#final_df = pd.DataFrame() # stores the complete results
# calculate sd and mean for each tag
for i in range(len(col_names)):
current_column = col_names[i]
df[current_column + '_mean_' +str(win)] = (df[current_column].rolling(window=win).mean())
df[current_column + '_min_' +str(win)] = (df[current_column].rolling(window=win).min())
df[current_column + '_max_' +str(win)] = (df[current_column].rolling(window=win).max())
df = df.fillna(0)
return(df)
您得到此结果
col_names = df.columns
df_extended = calculate_on_rolling_window(df,2,col_names)
df_extended.head()
open close positive open_mean_2 open_min_2 open_max_2 close_mean_2 close_min_2 close_max_2 positive_mean_2 positive_min_2 positive_max_2
0 7 6 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 5 6 1 6.0 5.0 7.0 6.0 6.0 6.0 0.5 0.0 1.0
2 10 11 1 7.5 5.0 10.0 8.5 6.0 11.0 1.0 1.0 1.0
3 11 10 0 10.5 10.0 11.0 10.5 10.0 11.0 0.5 0.0 1.0
4 6 7 1 8.5 6.0 11.0 8.5 7.0 10.0 0.5 0.0 1.0