梯度下降权重不断变大

问题描述 投票:0回答:1

为了熟悉梯度下降算法,我尝试创建自己的线性回归模型。对于少数数据点来说它效果很好。但是当尝试使用更多数据来拟合它时,w0 和 w1 的大小总是增加。有人能解释一下这个现象吗?

class LinearRegression:
    def __init__(self, x_vector, y_vector):

        self.x_vector = np.array(x_vector, dtype=np.float64)
        self.y_vector = np.array(y_vector, dtype=np.float64)
        self.w0 = 0
        self.w1 = 0

    def _get_predicted_values(self, x):
        formula = lambda x: self.w0 + self.w1 * x
        return formula(x)

    def _get_gradient_matrix(self):
        predictions = self._get_predicted_values(self.x_vector)
        w0_hat = sum((self.y_vector - predictions))
        w1_hat = sum((self.y_vector - predictions) * self.x_vector)

        gradient_matrix = np.array([w0_hat, w1_hat])
        gradient_matrix = -2 * gradient_matrix

        return gradient_matrix

    def fit(self, step_size=0.001, num_iterations=500):
        for _ in range(1, num_iterations):
            gradient_matrix = self._get_gradient_matrix()
            self.w0 -= step_size * (gradient_matrix[0])
            self.w1 -= step_size * (gradient_matrix[1])

    def _show_coeffiecients(self):
        print(f"w0: {self.w0}\tw1: {self.w1}\t")

    def predict(self, x):
        y = self.w0 + self.w1 * x
        return y
# This works fine
x = [x for x in range(-3, 3)]
f = lambda x: 5 * x - 7
y = [f(x_val) for x_val in x]

model = LinearRegression(x, y)
model.fit(num_iterations=3000)

model.show_coeffiecients() #output : w0: -6.99999999999994   w1: 5.00000000000002

#While this doesn't
x = [x for x in range(-50, 50)] # Increased the number of x values
f = lambda x: 5 * x - 7
y = [f(x_val) for x_val in x]

model = LinearRegression(x, y)
model.fit(num_iterations=3000)

model.show_coeffiecients()

最后一行产生警告:

RuntimeWarning: overflow encountered in multiply
w1_hat = sum((self.y_vector - predictions) * self.x_vector)
formula = lambda x: self.w0 + self.w1 * x
python numpy machine-learning linear-regression gradient-descent
1个回答
0
投票

这里有两种解决方案:

  1. 如果我们谈论 MSE 及其导数,那么您的代码中缺少一件事 - 除以样本数。你得到了相当大的梯度值,这可能是你无法达到成本函数最小值的原因。所以我建议你尝试一下这个:
    gradient_matrix = -2 * gradient_matrix / len(self.x_vector)
  2. 如果您确实想继续使用“(非标准化)平方误差” - 减少
    step_size
    值以减少梯度值并且不要错过函数最小值
© www.soinside.com 2019 - 2024. All rights reserved.