Tf -idf是给定值错误,它在抛出错误之前可以正常工作
tf_idf_vectorizer = TfidfVectorizer(ngram_range=(2,2))
tf_train=tf_idf_vectorizer.fit_transform(X_train)
tf_test= tf_idf_vectorizer.transform(X_test)
model=LogisticRegression()
model.fit(X_train,y_train)
y_predict=model.predict(X_test)
ValueError: X has 97624 features per sample; expecting 11
应该是model.fit(tf_train, y_train)
,然后是model.predict(tf_test)
。
tf_idf_vectorizer = TfidfVectorizer(ngram_range=(2,2))
tf_train=tf_idf_vectorizer.fit_transform(X_train)
tf_test= tf_idf_vectorizer.transform(X_test)
model=LogisticRegression()
model.fit(tf_train, y_train)
y_predict=model.predict(tf_test)
您fit_tranform
转换后的输入,即tf_train
,并且将model.predict
也应用于转换后的测试输入,即tf_test
。
[出于理智,请检查并执行len(X_train)
,您应该得到97624,然后len(X_test)
,并且应该得到11。这是此错误的来源:
ValueError:X每个样本具有97624个功能;期待11
P / S:仔细查看https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html