如何使用经过训练的模型(协调回归)并使用预测列值对新数据帧进行预测

问题描述 投票:0回答:1

我是数据科学新手,正在自学数据科学的基础知识。

我有两组数据 - 一组用于训练 (train.csv),其中模型被估计并用于预测名为“test.csv”的单独 csv 文件中的值

如何使用从 train.csv 文件开发和训练的模型,并在 test.csv 数据帧文件中创建包含所有预测结果的预测列?

我的做法如下:

import numpy as np
import pandas as pd
import scipy.stats as stats

from statsmodels.miscmodels.ordinal_model import OrderedModel
from pandas.api.types import CategoricalDtype
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
import statsmodels.api as sm

#Data from Train CSV file
df = pd.read_csv('train.csv')

#Data to make prediction from the model developed from Train.CSV file
df_test = pd.read_csv('test.csv')

weather_type= pd.CategoricalDtype(categories = ['clear', 'clouds','rain'], ordered = True)

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.60)

log_prob = OrderedModel(y_train, x_train, distr='logit')

#Train the model
log_p = log_prob.fit(method='bfgs')


#Use model and make prediction from the train.csv file
y_hat = log_p.predict(df_test)

#Create column - weather conditions in the test.csv file
df_test['weather-conditions'] = y_hat


但是我收到以下错误消息:

ValueError: Cannot set a DataFrame with multiple columns to the single column weather-conditions

有人可以给我一些指点吗?谢谢

python data-science logistic-regression ordinal
1个回答
0
投票

我在这一行中没有看到你对 x 和 y 的声明:

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.60)

您可能想预测

y_test
而不是
df_test

#Use model and make prediction from the train.csv file
y_hat = log_p.predict(df_test)
© www.soinside.com 2019 - 2024. All rights reserved.