我一直在
R
中构建一个中心移动平均线的函数(没有使用任何包),并且遇到了如下挑战:
如您所知,中心移动平均线包含合并“不完整部分”(即数据点的开头和结尾)的概念。例如,考虑下面的向量
p
:
p <- c(10,20,30,40,50,60,70,80,90)
在这种情况下,我感兴趣的中心移动平均线如下所示:
x <- ((10+20)/2, (10+20+30)/3, (20+30+40)/3 ..... (70+80+90)/3, (80+90)/2)
为了实现上述目标,我尝试了 if 函数,如下所示:
wd
的意思是 window size
mov_avg <- function(p, wd) {
x <- c(0, cumsum(p))
if ((p > p[1])&(p < p[length(p)])) {
neut <- 1:(length(p)-(wd-1))
upper <- neut+(wd-1)
x <- (x[upper]-x[neut])/(upper-neut)
} else if (p==p[1]) {
neut <- 0
upper <- neut+3
x <- (x[upper]-x[neut])/(upper-1-neut)
} else if (p==p[length(p)]) {
upper <-(length(p)+1)
neut <- (length(p)-(wd-2))
x <- (x[upper]-x[neut])/(upper-neut)
}
return(x)
}
然后我输入以下行来执行:
mov_avg(p, 3)
我遇到如下错误:
numeric(0)
Warning messages:
1: In if ((p > p[1]) & (p < p[length(p)])) { :
the condition has length > 1 and only the first element will be used
2: In if (p == p[1]) { :
the condition has length > 1 and only the first element will be used
有人可以帮我让它成为一个工作功能吗?
谢谢!
我们也可以使用
rowMeans
rowMeans(embed(c(NA, p, NA), 3)[, 3:1], na.rm = TRUE)
#[1] 15 20 30 40 50 60 70 80 85
在基本 R 中这样的东西怎么样:
window <- 3
p <- c(10,20,30,40,50,60,70,80,90)
x <- c(NA, p, NA)
sapply(seq_along(x[-(1:(window - 1))]), function(i)
mean(x[seq(i, i + window - 1)], na.rm = T))
#[1] 15 20 30 40 50 60 70 80 85
技巧是添加侧翼
NA
,然后将 mean
与 na.rm = T
一起使用。
我知道你说“不使用包”,但使用
zoo::rollapply
的效果甚至更短
library(zoo)
rollapply(c(NA, p, NA), 3, mean, na.rm = T)
#[1] 15 20 30 40 50 60 70 80 85
另一种方法是创建一个函数,我们可以用变量
window
s 进行调整
mov_avg <- function(p, window) {
mean_number = numeric()
index = 1
while(index < length(p)) {
if (index == 1 | index == length(p) - 1)
mean_number = c(mean_number, mean(p[index:(index + window - 2)]))
else
mean_number = c(mean_number, mean(p[index:(index + window - 1)]))
index = index + 1
}
mean_number
}
mov_avg(p, 3)
#[1] 15 30 40 50 60 70 80 85
mov_avg(p, 2)
#[1] 10 25 35 45 55 65 75 80
按列为 x 的矩阵中的行取平均值,头和尾分别附加前两个和最后两个元素的平均值。
apply( matrix( c(x,
c( x[1]+x[2])/2, head(x,-1) ),
c( tail(x,-1), sum( tail(x,2))/2) ),
ncol = 3),
1, mean)
这是一种快速矢量化方法(但在我的基准测试中它仍然比 Akrun 的方法慢):
> p=seq(10,90,10)
> o=outer(1:length(p),-1:1,"+")
> rowMeans(matrix(p[ifelse(o<1,NA,o)],length(p)),na.rm=T)
[1] 15 20 30 40 50 60 70 80 85
基准:
v=sample(1e4)
vectorized=\(x,y){
o=outer(1:length(x),y,"+")
rowMeans(matrix(x[ifelse(o<1,NA,o)],length(x)),na.rm=T)
}
nonvectorized=\(x,y){
l=length(x)
sapply(1:l,\(i)mean(x[max(1,i-y):min(i+y,l)],na.rm=T))
}
mov_avg <- function(p, window) {
mean_number = numeric()
index = 1
while(index < length(p)) {
if (index == 1 | index == length(p) - 1)
mean_number = c(mean_number, mean(p[index:(index + window - 2)]))
else
mean_number = c(mean_number, mean(p[index:(index + window - 1)]))
index = index + 1
}
mean_number
}
bench=\(times,...){
arg=match.call(expand.dots=F)$...;l=length(arg);out=double(times*l)
rand=sample(rep(1:l,times))
n=1;for(x in arg[rand]){t1=Sys.time();eval.parent(x);t2=Sys.time()-t1;out[n]=t2;n=n+1}
setNames(out,sapply(arg[rand],\(x)gsub(" +"," ",paste(deparse(x),collapse=" "))))
}
bench(100,
vectorized(v,-1:1),
nonvectorized(v,1)
rowMeans(embed(c(NA,v,NA),3),na.rm=T),
apply(matrix(c(v,c((v[1]+v[2])/2,head(v,-1)),c(tail(v,-1),sum(tail(v,2))/2)),ncol=3),1,mean),
mov_avg(v,3),
zoo::rollapplyr(v,3,mean,align="center",partial=T),
zoo::rollapply(c(NA,v,NA),3,mean,na.rm=T),
slider::slide_dbl(v,mean,.before=1,.after=1)
)
o=sort(tapply(b,names(b),median))
writeLines(sprintf("%.1f %s",o/min(o),names(o)))
输出显示相对于最快选项的一百次运行的中值时间:
1.0 rowMeans(embed(c(NA, v, NA), 3), na.rm = T)
2.4 vectorized(v, -1:1)
69.1 slider::slide_dbl(v, mean, .before = 1, .after = 1)
75.1 apply(matrix(c(v, c((v[1] + v[2])/2, head(v, -1)), c(tail(v, -1), sum(tail(v, 2))/2)), ncol = 3), 1, mean)
96.8 nonvectorized(v, 1)
153.5 zoo::rollapply(c(NA, v, NA), 3, mean, na.rm = T)
167.6 zoo::rollapplyr(v, 3, mean, align = "center", partial = T)
393.0 mov_avg(v, 3)