从零开始python的线性回归和梯度下降

问题描述 投票:0回答:1

我正在尝试从头开始运行以下线性回归代码。当我为我的线性回归类创建对象并调用我的方法时,出现类型错误。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('/Users/MyName/Downloads/archive/prices.csv')
X = df['volume'].values
y = df['close'].values

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

class Lin_Reg():
    def __init__(self, lr=0.01, n_iters=10000):
        self.lr = lr
        self.n_iters = n_iters
        self.weights = None
        self.bias = None
        
    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0
        
        for _ in range(self.n_iters):
            y_pred = np.dot(X, self.weights) + self.bias

            dw = (1/n_samples) * np.dot(X, (y_pred - y))
            db = (1/n_samples) * np.sum(y_pred-y)

            self.weight = self.weight -self.lr * dw
            self.bias = self.bias -self.lr * db
    
    def predict(self, X):
        y_pred = np.dot(X, self.weights) + self.bias
        return y_pred

reg = Lin_Reg()
reg.fit(X_train, y_train)
predictions = reg.predict(X_test)

错误信息是

ValueError: not enough values to unpack (expected 2, got 1)


产生这个错误的行是 n_samples, n_features = X.shape

我正在使用的数据集可以在这里找到:https://www.kaggle.com/datasets/dgawlik/nyse。我正在使用 prices.csv 文件。

python machine-learning linear-regression gradient-descent
1个回答
0
投票

问题出在这一行:

X = df['volume'].values

这只会给你一个单一的列,它的形状是

(N,)
,其中
N
是行数。因为它是单值元组,所以这一行会引发错误:

n_samples, n_features = X.shape

在你的情况下,你可以这样做:

n_samples, n_features = len(X), 1
© www.soinside.com 2019 - 2024. All rights reserved.