我是数据科学新手,正在自学数据科学的基础知识。
我有两组数据 - 一组用于训练 (train.csv),其中模型被估计并用于预测名为“test.csv”的单独 csv 文件中的值
如何使用从 train.csv 文件开发和训练的模型,并在 test.csv 数据帧文件中创建包含所有预测结果的预测列?
我的做法如下:
import numpy as np
import pandas as pd
import scipy.stats as stats
from statsmodels.miscmodels.ordinal_model import OrderedModel
from pandas.api.types import CategoricalDtype
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
import statsmodels.api as sm
#Data from Train CSV file
df = pd.read_csv('train.csv')
#Data to make prediction from the model developed from Train.CSV file
df_test = pd.read_csv('test.csv')
weather_type= pd.CategoricalDtype(categories = ['clear', 'clouds','rain'], ordered = True)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.60)
log_prob = OrderedModel(y_train, x_train, distr='logit')
#Train the model
log_p = log_prob.fit(method='bfgs')
#Use model and make prediction from the train.csv file
y_hat = log_p.predict(df_test)
#Create column - weather conditions in the test.csv file
df_test['weather-conditions'] = y_hat
但是我收到以下错误消息:
ValueError: Cannot set a DataFrame with multiple columns to the single column weather-conditions
有人可以给我一些指点吗?谢谢
我在这一行中没有看到你对 x 和 y 的声明:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.60)
您可能想预测
y_test
而不是 df_test
:
#Use model and make prediction from the train.csv file
y_hat = log_p.predict(df_test)