两个文件之间的Classification_report

问题描述 投票:1回答:2

我想在两个文件之间做分数。两者具有相同的数据但不是相同的标签。来自火车数据的标签是校正的,来自测试数据的标签不一定......我想知道准确性,召回率和f分数。

import pandas
import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn import cross_validation
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_recall_fscore_support as score
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import precision_score, recall_score, confusion_matrix, classification_report, accuracy_score, f1_score

df_train = pd.read_csv('train.csv', sep = ',')
df_test = pd.read_csv('teste.csv', sep = ',')

vec_train = TfidfVectorizer()
X_train = vec_train.fit_transform(df_train['text'])
y_train = df_train['label']

vec_test = TfidfVectorizer()
X_test = vec_test.fit_transform(df_train['text'])
y_test = df_test['label']

clf = LogisticRegression(penalty='l2', multi_class = 'multinomial',solver ='newton-cg')

y_pred = clf.predict(X_test)

print ("Accuracy on training set:")
print (clf.score(X_train, y_train))
print ("Accuracy on testing set:")
print (clf.score(X_test, y_test))
print ("Classification Report:")
print (metrics.classification_report(y_test, y_pred))

一个愚蠢的数据示例:

TRAIN
text,label
dogs are cool,animal
flowers are beautifil,plants
pen is mine,objet
beyonce is an artist,person

TEST
text,label
dogs are cool,objet
flowers are beautifil,plants
pen is mine,person
beyonce is an artist,animal

错误:

Traceback(最近一次调用最后一次):

文件“accuracy.py”,第30行,在y_pred = clf.predict(X_test)中

文件“/usr/lib/python3/dist-packages/sklearn/linear_model/base.py”,第324行,预测分数= self.decision_function(X)

文件“/usr/lib/python3/dist-packages/sklearn/linear_model/base.py”,第298行,在decision_function中“尚未”%{'name':type(self).name})sklearn.exceptions.NotFittedError:此LogisticRegression实例尚未安装

我只想计算测试的准确性

python python-3.x machine-learning scikit-learn metrics
2个回答
1
投票

您在测试数据上拟合了新的TfidfVectorizer。这会产生错误的结果。您应该使用您在火车数据上安装的相同对象。

做这个:

vec_train = TfidfVectorizer()
X_train = vec_train.fit_transform(df_train['text'])

X_test = vec_train.transform(df_test['text'])

在那之后,正如@MohammedKashif所说,你需要首先训练你的LogisticRegression模型,然后进行测试预测。

clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

之后,您可以使用评分代码而不会出现任何错误。


1
投票

在使用X_train上的预测函数之前,必须首先使用X_test训练分类器对象。像这样的东西

clf = LogisticRegression(penalty='l2', multi_class = 'multinomial',solver ='newton-cg')

#Then train the classifier over training data
clf.fit(X_train, y_train)

#Then use predict function to make predictions
y_pred = clf.predict(X_test)
© www.soinside.com 2019 - 2024. All rights reserved.