如何将支持向量机(SVM)应用于非英语数据集文件?

问题描述 投票:-1回答:1

我将SVM应用到英语数据集。它真的很好。但当我应用一些没有英文数据集的csv文件时,它将引发错误。

import pandas as pd
data = pd.read_csv('love.csv',encoding="ISO-8859-1") 
import numpy as np
numpy_array = data.to_numpy()
X = numpy_array[0:,1]
Y = numpy_array[:,2]

from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.12, random_state=10)
from sklearn.feature_extraction.text import CountVectorizer

from sklearn.feature_extraction.text import TfidfTransformer

from sklearn.naive_bayes import MultinomialNB

from sklearn.linear_model import SGDClassifier

from sklearn.pipeline import Pipeline
text_clf = Pipeline([('vect', CountVectorizer(stop_words='english')),('tfidf', TfidfTransformer()),('clf-svm', SGDClassifier(loss='hinge', penalty='l2',alpha=1e-1, random_state=5)),])

text_clf = text_clf.fit(X_train,Y_train)

predicted = text_clf.predict(X_test)
print(predicted)
accuracy=np.mean(predicted == Y_test)*100
print(accuracy)

我的CSV文件由3列组成,所有行值都使用尼泊尔语(非英语)。如何将SVM算法应用于那些非英语数据集?

python machine-learning scikit-learn svm
1个回答
© www.soinside.com 2019 - 2024. All rights reserved.