当我进行梯度下降时，该值是通货膨胀

Question

在这段代码中，我创建了一个梯度下降算法来优化 Theta0 和 Theta1，每次都会检查：使用新 Theta 的函数的成本是否更小？ ---然后设置这个Theta。但第一次发现新的 theta 成本更大，我该怎么办？

import numpy as np
import matplotlib.pyplot as plt


x = np.array([460 ,232, 315, 178])
y = np.array([2104, 1416, 1534, 852])

class LinearFormala :
  def __init__ (self):
    self.theta_0 = np.random.rand()
    self.theta_1 = np.random.rand()
  def calc_y (self,x):
    return self.theta_0 + ( self.theta_1 * x )
  def cost (self,th,X,y):
    pred_y = np.array( [th[0]+(th[1]*x) for x in X] )
    return np.absolute(y - pred_y).mean()
  def GD (self, x_s, y_s, lr=0.001):
    optm_th0 = True
    optm_th1 = True
    tmp_th_0 = 0
    tmp_th_1 = 0
    for x,y in zip(x_s,y_s):
      if optm_th0:
        tmp_th_0 = self.theta_0 - lr * self.cost([self.theta_0,self.theta_1], x_s,y_s)
        print('here',self.cost([tmp_th_0,self.theta_1],x_s,y_s), self.cost([self.theta_0,self.theta_1],x_s,y_s))
        if self.cost([tmp_th_0,self.theta_1],x_s,y_s) < self.cost([self.theta_0,self.theta_1],x_s,y_s):
          self.theta_0 = tmp_th_0
        else:
          optm_th0=False
      if optm_th1:
        tmp_th_1 = self.theta_1 - lr * self.cost([self.theta_0,self.theta_1], x_s,y_s) * x
        print('here2',self.cost([self.theta_0,tmp_th_1],x_s,y_s), self.cost([self.theta_0,self.theta_1],x_s,y_s))
        if self.cost([self.theta_0,tmp_th_1],x_s,y_s) < self.cost([self.theta_0,self.theta_1],x_s,y_s):
          self.theta_1 = tmp_th_1
        else:
          optm_th1=False


model = LinearFormala()
print('Thetas :',[model.theta_0 , model.theta_1])
plt.plot(x,[model.calc_y(i) for i in x], label='Before Training')
plt.plot(x,y,'ro',label='Real Data')
plt.legend(loc='best')
model.GD(x,y)
print('Thetas :',[model.theta_0 , model.theta_1])

Answer 1

我不明白你帖子上的所有内容，所以我会尝试假设一些事情，有时你会告诉我我是否错了......所以首先，梯度下降算法的目标是最小化函数。所以我猜你尝试最小化的函数是 Linearformula 类的损失函数。

我看到的主要错误是，为了调整参数 self.theta_0 和 self.theta_1，你使用成本函数而不是使用成本函数的梯度......调整函数参数的公式是： x_k+1 = x_k − lr * ∇f(x_k) 你做了： x_k+1 = x_k − lr * f(x_k)

所以

tmp_th_0 = self.theta_0 - lr * self.cost([self.theta_0,self.theta_1], x_s,y_s)

变成：

tmp_th_0 = self.theta_0 - lr * self.gradcost([self.theta_0,self.theta_1], x_s,y_s)[0]

和

tmp_th_1 = self.theta_1 - lr * self.cost([self.theta_0,self.theta_1], x_s,y_s) * x

变成：

tmp_th_1 = self.theta_1 - lr * self.gradcost([self.theta_0,self.theta_1], x_s,y_s)[1]

与：

def gradcost(self,th,X,y):
    pred_y = [[],[]]
    pred_y = np.array(pred_y)
    pred_y[0] = [1 + (th[1]*x) for x in X]
    pred_y[1] = [th[0] + x for x in X]
    mean_1 = mean(pred_y[0])
    mean_2 = mean(pred_y[1])
    return np.array([[mean_1],[mean_2]])

您将在 Linearformula 类中定义

使用梯度下降算法，你不需要像

这样的条件

if self.cost([tmp_th_0,self.theta_1],x_s,y_s) < self.cost([self.theta_0,self.theta_1],x_s,y_s):
      self.theta_0 = tmp_th_0

你只需调整你的参数...

最后但并非最不重要的一点是，您需要一个 while 循环，因为如果没有它，您将只能调整参数一次，因为梯度下降算法是一种迭代算法。并停止你的 while 循环：要么你得到一个足够小的 ||∇f(x_k)|| (||∇f(x_k)|| < 1e-3 for example) or you reach an itermax value of 1e3 or 1e4 iterations (another value you need to define before).

我希望我能帮助到你，如果你不明白，请不要犹豫，祝你接下来的编码一切顺利。

当我进行梯度下降时，该值是通货膨胀

问题描述投票：0回答：1

1个回答

最新问题

当我进行梯度下降时，该值是通货膨胀

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1