我将SVM应用到英语数据集。它真的很好。但当我应用一些没有英文数据集的csv文件时,它将引发错误。
import pandas as pd
data = pd.read_csv('love.csv',encoding="ISO-8859-1")
import numpy as np
numpy_array = data.to_numpy()
X = numpy_array[0:,1]
Y = numpy_array[:,2]
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.12, random_state=10)
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import SGDClassifier
from sklearn.pipeline import Pipeline
text_clf = Pipeline([('vect', CountVectorizer(stop_words='english')),('tfidf', TfidfTransformer()),('clf-svm', SGDClassifier(loss='hinge', penalty='l2',alpha=1e-1, random_state=5)),])
text_clf = text_clf.fit(X_train,Y_train)
predicted = text_clf.predict(X_test)
print(predicted)
accuracy=np.mean(predicted == Y_test)*100
print(accuracy)
我的CSV文件由3列组成,所有行值都使用尼泊尔语(非英语)。如何将SVM算法应用于那些非英语数据集?