如何提高数据准确性?

问题描述 投票:0回答:1

我正在做第一个关于预测NBA球员薪水的数据科学项目。但是,我为数据使用了两个模型,并且我的准确性得分非常低。谁能帮助我提高准确性得分?谢谢

用于线性回归的r2_score:0.5836029556187516

r2_score for random forest regressor:0.6287935547320641

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score
from sklearn import metrics
import math

df = pd.read_csv('nba_eda.csv')
df_model = df[['CurrentSalary', 'PTS', 'MP', 'Age', 'G','WS', 'STL', 'TRB',  'AST', 'BLK', 'TOV']]

df_test = pd.get_dummies(df_model)
from sklearn.model_selection import train_test_split

X = df_test.drop('CurrentSalary', axis=1)
Y = df_test.CurrentSalary.values

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3, random_state = 0) 
regressor = LinearRegression() 
regressor.fit(x_train, y_train)
regressor.score(x_train,y_train)
y_pred = regressor.predict(x_test)
accuracy = r2_score(y_test, y_pred)

lr = RandomForestRegressor(n_estimators=100)
lr.fit(x_train, y_train)
prediction = lr.predict(x_test)
acc = r2_score(y_test, prediction)
data-science linear-regression random-forest data-modeling test-data
1个回答
0
投票

我认为尝试更复杂的模型可能会提高准确性,也许您可​​以尝试多项式回归这是一个代码示例:

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

pf = PolynomialFeatures(degree = 2) 
X_polynomial = pf.fit_transform(X)
linModel = LinearRegression() 
linModel.fit(X_polynomial, y) 

您也可以尝试增加多项式特征的次数。

© www.soinside.com 2019 - 2024. All rights reserved.