我正在尝试从头开始运行以下线性回归代码。当我为我的线性回归类创建对象并调用我的方法时,出现类型错误。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('/Users/MyName/Downloads/archive/prices.csv')
X = df['volume'].values
y = df['close'].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)
class Lin_Reg():
def __init__(self, lr=0.01, n_iters=10000):
self.lr = lr
self.n_iters = n_iters
self.weights = None
self.bias = None
def fit(self, X, y):
n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0
for _ in range(self.n_iters):
y_pred = np.dot(X, self.weights) + self.bias
dw = (1/n_samples) * np.dot(X, (y_pred - y))
db = (1/n_samples) * np.sum(y_pred-y)
self.weight = self.weight -self.lr * dw
self.bias = self.bias -self.lr * db
def predict(self, X):
y_pred = np.dot(X, self.weights) + self.bias
return y_pred
reg = Lin_Reg()
reg.fit(X_train, y_train)
predictions = reg.predict(X_test)
错误信息是
ValueError: not enough values to unpack (expected 2, got 1)
产生这个错误的行是 n_samples, n_features = X.shape
我正在使用的数据集可以在这里找到:https://www.kaggle.com/datasets/dgawlik/nyse。我正在使用 prices.csv 文件。
问题出在这一行:
X = df['volume'].values
这只会给你一个单一的列,它的形状是
(N,)
,其中N
是行数。因为它是单值元组,所以这一行会引发错误:
n_samples, n_features = X.shape
在你的情况下,你可以这样做:
n_samples, n_features = len(X), 1