在R中,您可以计算具有指定窗口的滚动平均值,该窗口可以每次移动指定量。
但是也许我只是在任何地方都没有找到它,但似乎无法在熊猫或其他Python库中做到这一点?
有人知道解决此问题的方法吗?我将举例说明我的意思:
这里有半个月的数据,我正在计算两个月的移动平均值,每个月都在变化。
所以在R中,我将执行以下操作:two_month__movavg=rollapply(mydata,4,mean,by = 2,na.pad = FALSE)
Python中没有等效项吗?
EDIT1:
DATE A DEMAND ... AA DEMAND A Price
0 2006/01/01 00:30:00 8013.27833 ... 5657.67500 20.03
1 2006/01/01 01:00:00 7726.89167 ... 5460.39500 18.66
2 2006/01/01 01:30:00 7372.85833 ... 5766.02500 20.38
3 2006/01/01 02:00:00 7071.83333 ... 5503.25167 18.59
4 2006/01/01 02:30:00 6865.44000 ... 5214.01500 17.53
您可以再次使用滚动,只需要一点点分配索引就可以了
这里by = 2
by = 2
df.loc[df.index[np.arange(len(df))%by==1],'New']=df.Price.rolling(window=4).mean()
df
Price New
0 63 NaN
1 92 NaN
2 92 NaN
3 5 63.00
4 90 NaN
5 3 47.50
6 81 NaN
7 98 68.00
8 100 NaN
9 58 84.25
10 38 NaN
11 15 52.75
12 75 NaN
13 19 36.75
现在,对于一维数据数组来说,这有点过头了,但是您可以简化它并提取所需的内容。由于熊猫可以依赖numpy,因此您可能需要检查一下熊猫的滚动/跨步功能(如果实现了)。结果为20个连续数字。 7天的窗口,大步/滑动2
z = np.arange(20)
z #array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
s = stride(z, (7,), (2,))
np.mean(s, axis=1) # array([ 3., 5., 7., 9., 11., 13., 15.])
这是我使用的代码,不包含文档的主要部分。它源自numpy中stridd函数的许多实现,可以在此站点上找到。有变体和化身,这只是另一个。
def stride(a, win=(3, 3), stepby=(1, 1)):
"""Provide a 2D sliding/moving view of an array.
There is no edge correction for outputs. Use the `pad_` function first."""
err = """Array shape, window and/or step size error.
Use win=(3,) with stepby=(1,) for 1D array
or win=(3,3) with stepby=(1,1) for 2D array
or win=(1,3,3) with stepby=(1,1,1) for 3D
---- a.ndim != len(win) != len(stepby) ----
"""
from numpy.lib.stride_tricks import as_strided
a_ndim = a.ndim
if isinstance(win, int):
win = (win,) * a_ndim
if isinstance(stepby, int):
stepby = (stepby,) * a_ndim
assert (a_ndim == len(win)) and (len(win) == len(stepby)), err
shp = np.array(a.shape) # array shape (r, c) or (d, r, c)
win_shp = np.array(win) # window (3, 3) or (1, 3, 3)
ss = np.array(stepby) # step by (1, 1) or (1, 1, 1)
newshape = tuple(((shp - win_shp) // ss) + 1) + tuple(win_shp)
newstrides = tuple(np.array(a.strides) * ss) + a.strides
a_s = as_strided(a, shape=newshape, strides=newstrides, subok=True).squeeze()
return a_s
我未能指出您可以创建输出,并可以将其作为列附加到熊猫中。回到上面使用的原始定义
nans = np.full_like(z, np.nan, dtype='float') # z is the 20 number sequence
means = np.mean(s, axis=1) # results from the strided mean
# assign the means to the output array skipping the first and last 3 and striding by 2
nans[3:-3:2] = means
nans # array([nan, nan, nan, 3., nan, 5., nan, 7., nan, 9., nan, 11., nan, 13., nan, 15., nan, nan, nan, nan])
如果数据大小不是太大,这是一个简单的方法:
by = 2
win = 4
start = 3 ## it's the index of your 1st valid value.
df.rolling(win).mean()[start::by] ## caculate all, choos what you need.