持久化模型如何提高准确性?

问题描述 投票:0回答:1
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score

whitewine_data = pd.read_csv('winequality-white.csv', 
delimiter=';')

variables = ['alcohol_cat', 'alcohol', 'sulphates', 'density', 
'total sulfur dioxide', 'citric acid', 'volatile acidity', 
'chlorides']

X = whitewine_data[variables]
y = whitewine_data['quality']
X_train, X_test, y_train, y_test = train_test_split(X, y, 
test_size=0.2)

model = DecisionTreeClassifier()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred, average='weighted')

predictions = model.predict([[0.27, 0.36, 0.045, 170, 1.001, 
0.45, 8.9, 0]])
print(f'Predicted Output: {predictions}')
print(f'Accuracy: {accuracy * 100}%')
print(f'F1 Score: {f1 * 100}% ')

这个初始模型的准确度得分为 57%

====================================================== =============

whitewine_data = pd.read_csv('winequality-white.csv', 
delimiter=';')

# Variables to be dropped from the data set - NOT THE INPUT 
VARIABLES
variables = ['fixed acidity', 'residual sugar', 'free sulfur 
dioxide', 'pH', 'quality', 'isSweet']

X = whitewine_data.drop(variables, axis=1)
y = whitewine_data['quality']

X_train, X_test, y_train, y_test = train_test_split(X, y, 
test_size=0.2)

model = DecisionTreeClassifier()
model.fit(X_train, y_train)

joblib.dump(model, 'WhiteWine_Quality_Predictor.joblib')

创建保存的模型

====================================================== =============

whitewine_data = pd.read_csv('winequality-white.csv', 
delimiter=';') 

variables = ['volatile acidity', 'citric acid', 'chlorides', 
'total sulfur dioxide', 'density', 'sulphates', 'alcohol', 
'alcohol_cat']

X_test = whitewine_data[variables]
y_test = whitewine_data['quality']  

model = joblib.load('WhiteWine_Quality_Predictor.joblib')

y_pred = model.predict(X_test)

f1 = f1_score(y_test, y_pred, average='weighted')
accuracy = accuracy_score(y_test, y_pred)
predictions = model.predict([[0.27, 0.36, 0.045, 170, 1.001, 
0.45, 10.9, 3]])

print(f'F1 Score: {f1 * 100}%')
print(f'Model Accuracy: {accuracy * 100}%')
print(f'Predicted Output: {predictions}')

调用保存的模型现在准确率达到 92%

问题:调用已保存的模型如何导致增量增加 我看到的准确性

python classification decision-tree ml
1个回答
0
投票

这是一开始处理 ML 算法时很常见的错误。

在第二个脚本中,您正在 winequality-white.csv 数据集上训练算法,然后保存它。完全没问题。

问题在于,在第三个中,您在与训练所用的完全相同的数据集上使用该算法。您基本上是在预测用于训练的观察结果,因此很明显该算法将以 100% 的准确度预测它们。

存储算法的方法是正确的,但是您必须使用另一个数据集来实际使用它进行预测,而不是用于训练的数据集。

© www.soinside.com 2019 - 2024. All rights reserved.