我正在尝试编写一个函数来计算R中的梯度。该函数必须特别使用for循环来执行此操作。我编写此函数是为了说明在尝试计算梯度时for循环如何不如矢量化编程有效。
该函数接受设计矩阵X和系数β的矢量,以计算成本函数的梯度(设计矩阵是第一列上带有协变量的协变量矩阵。)
成本函数是MSE,
。我正在使用解析函数计算梯度,并采用损失函数的偏导数。这使我们的偏导数为,
对于第一个系数$ \ beta_ {0} $,然后类似地对于所有其他系数$ \ beta_ {j} $,
我设法使用矢量化编程作为下面的代码行来计算答案(实现上述步骤。
-2*t(X)%*%(y-X%*%beta)
我尝试使用for循环,没有用,但是看起来像这样,
# Initialize the matrix to hold gradient
gradient = matrix(0, nrow = nrow(beta))
for(i in 1:nrow(beta)){
if(i == 1){
gradient[i] = -2*sum(y - X%*%beta) # first value
}else if(i>1){
gradient[i] = -2*sum( t(X)%*%(y - X%*%beta) ) * apply(X[,-1],2,sum)[i-1]
} }
下面是生成要使用的数据的代码,以及我在R中尝试过的两个实现。该代码生成数据,并且可以用于测试是否将for循环复制到R中。
# Values
# Random data Generated
n = 4
p = 3
beta_0 = 2
beta = matrix(c(beta_0,3,1,4), nrow= (p+1) ) # coefficients
X = runif(n=(n*p), min=-5,max= 5) # covariates
X = matrix(X, nrow = n, ncol = p)
X = cbind(1, X) # make a design matrix
y = apply(X[,-1],1,mean) # Response (Some function) #
# Print all to show all initial values
print(list("Initial Values:"="","n:"=n," p:" = p, "beta_0:" = beta_0," beta:"=beta,
" X:"=X," y:" = y))
# Function 1
# Find the gradient (using a for loop)
# The partial derivative of the loss function
df_ols = function(beta,X,y){
begin.time <- proc.time()
# Initialize the matrix to hold gradient
gradient = matrix(0, nrow = nrow(beta))
for(i in 1:nrow(beta)){
if(i == 1){
gradient[i] = -2*sum(y - X%*%beta) # first value
}else if(i>1){
gradient[i] = -2*sum( t(X)%*%(y - X%*%beta) ) * apply(X[,-1],2,sum)[i-1]
} }
end.time <- proc.time()
time <- (end.time-begin.time)[3]
print(list("gradient 1"=gradient,"time"=time))
}
df_ols(beta,X,y)
# Function 2
# Find the gradient Approach 2 using vectorized programming
gradient_3 <- function(X, beta){
begin.time <- proc.time()
# Finding the gradient
grad_3 <- -2*t(X)%*%(y-X%*%beta)
grad_3 <- matrix(grad_3, ncol = 1,nrow = ncol(X)) # Turn into a column matrix
end.time <- proc.time()
time <- (end.time-begin.time)[3]
print(list("gradient 3"= grad_3 ,"time"=time))
}
gradient_3(X, beta) # Assumed Correct
如果我不太罗word,我深表歉意。任何帮助,将不胜感激。
我设法使for循环起作用,下面您将看到答案。
# Find the gradient (using a for loop) explicit, element-wise formulations
# The partial derivative of the loss function
df_ols = function(beta,X,y){
begin.time <- proc.time()
# Initialize the matrix to hold gradient
gradient = matrix(0, nrow = nrow(beta))
for(i in 1:nrow(beta)){
if(i == 1){
gradient[i] = -2*sum(y - X%*%beta) # first value
}else if(i>1){
gradient[i] = -2*t(X[,i])%*%(y-X%*%beta)
#gradient[i] = -2*sum( t(X)%*%(y - X%*%beta) ) * apply(X[,-1],2,sum)[i-1]
} }
end.time <- proc.time()
time <- (end.time-begin.time)[3]
print(list("gradient 1"=gradient,"time"=time))
}
df_ols(beta,X,y)
,运行矢量化和元素化地层,我们发现矢量化过程要快得多。我们将在下面执行完整的过程。
# Random data Generated
n = 100000*12
p = 3
beta_0 = 2
beta = matrix(c(beta_0,3,1,4), nrow= (p+1) ) # coefficients
X = runif(n=(n*p), min=-5,max= 5) # covariates
X = matrix(X, nrow = n, ncol = p)
X = cbind(1, X) # make a design matrix
y = apply(X[,-1],1,mean) # Response (Some function)
# Print parameters. To show all initial values
print(list("Initial Values:" = '',"n:"=n," p:" = p, "beta_0:" = beta_0," beta:"=beta,
" X:"=round(X,digits=2)," y:" = round(y,digits=2)))
# Function 3
# Find the gradient Approach 3, using vectorized programming
gradient_3 <- function(X, beta,y){
begin.time <- proc.time()
# Find the partial derivative of the rest (j'th)
grad_3 <- -2*t(X)%*%(y-X%*%beta)
grad_3 <- matrix(grad_3, ncol = 1,nrow = ncol(X)) # Turn into a column matrix
end.time <- proc.time()
time <- (end.time-begin.time)[3]
print(list("gradient 3"= grad_3 ,"time"=time))
}
显示结果
> df_ols(beta,X,y) # Elementwise
$`gradient 1`
[,1]
[1,] 4804829
[2,] 53311879
[3,] 13471077
[4,] 73259191
$time
elapsed
3.59
> gradient_3(X, beta,y) # Vectorized Programming solution
$`gradient 3`
[,1]
[1,] 4804829
[2,] 53311879
[3,] 13471077
[4,] 73259191
$time
elapsed
0.89