多元线性回归房价r2得分问题

问题描述 投票:0回答:1

我有样本房价数据和简单代码:

import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

data = pd.read_csv('house_price_4.csv')
df = pd.DataFrame(data)
df['Area'] = df['Area'].str.replace(',', '')
df = df.dropna()

# Encoding the categorical feature 'Address'
df['Address'] = df['Address'].astype('category').cat.codes
df['Parking'] = df['Parking'].replace({True: 1, False: 0})
df['Warehouse'] = df['Warehouse'].replace({True: 1, False: 0})
df['Elevator'] = df['Elevator'].replace({True: 1, False: 0})

X = df.drop(columns=['Price(USD)','Price'])
y = df['Price']


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

r_squared = r2_score(y_test, y_pred)
print(f'R^2 Score: {r_squared:.4f}')   

                                                                  

我的 R2 分数非常低:0.34

如何获得更高的 R2 分数?

这是我的示例数据:https://drive.google.com/file/d/14Se90XbGJivftq3_VrtgRSalkCplduVX/view?usp=sharing

python pandas machine-learning regression sklearn-pandas
1个回答
0
投票

除了线性回归之外,您还可以使用其他模型来测试是否可以对数据进行建模。顺便说一句,R² 并不是使用线性回归的最大问题。使用我的答案来研究两种情况下的残差图,因为假设线性回归的残差清楚地暗示了异方差性。在这里查看比较:

fig, axs = plt.subplots(nrows = 1, ncols = 2) # define subplots
###################################################################################
lrModel = LinearRegression() # random forest
lrModel.fit(XTrain, yTrain) # fit
lryPred = lrModel.predict(XTest) # test
lrRMSE = mean_squared_error(yTest, lryPred, squared=False) # RMSE
lrR2 = r2_score(yTest, lryPred) # R2
axs[0].scatter(lryPred, yTest) # scatter
axs[0].set_title("Linear Regression\nR² = "+str(round(lrR2,2))+"; RMSE = "+str(round(lrRMSE)))
###################################################################################
dtModel = DecisionTreeRegressor(random_state=42) # decision tree
dtModel.fit(XTrain, yTrain) # fit
dtyPred = dtModel.predict(XTest) # test
dtRMSE = mean_squared_error(yTest, dtyPred, squared=False) # RMSE
dtR2 = r2_score(yTest, dtyPred) # R2
axs[1].scatter(dtyPred, yTest) # scatter
axs[1].set_title("Decision Tree Regressor\nR² = "+str(round(dtR2,2))+"; RMSE = "+str(round(dtRMSE)))

结果是这样的:

线性回归的选择从一开始就是错误的。预测也呈负数。使用决策树或随机森林,它们应该给出相似的拟合。

© www.soinside.com 2019 - 2024. All rights reserved.