TypeError:'

问题描述 投票:-1回答:1

我正在创建一个机器学习算法,用于情绪分析,但一直遇到这个错误

TypeError:'int'和'str'的实例之间不支持'<'

我已经看到了其他问题,但只有一种解决方法,比如“str'和'int'实例之间不支持”TypeError:'<'

train_data = "C:/Users/User/Abhinav/TrumpStuff/trumpwords.csv"
Xwords = pd.read_csv(train_data, usecols=[2], header=None)
ywords_pos = pd.read_csv(train_data, usecols=[3], header=None)
ywords_neg = pd.read_csv(train_data, usecols=[4], header=None)
ywords_bad = pd.read_csv(train_data, usecols=[5], header=None)

count_vect = CountVectorizer()
Xtrain_counts = count_vect.fit_transform(getStringArray(Xwords))
tfidf_transformer = TfidfTransformer()
Xtrain_tfidf = tfidf_transformer.fit_transform(Xtrain_counts)

clf_positive = MultinomialNB().fit(Xtrain_tfidf, ywords_pos)
clf_negative = MultinomialNB().fit(Xtrain_tfidf, ywords_neg)
clf_bad = MultinomialNB().fit(Xtrain_tfidf, ywords_bad)

"""
 My data is from https://data.world/lovesdata/trump-tweets-5-4-09-12-5-16/workspace/file?filename=trumpwords.xlsx
"""

我希望代码运行并给我一个情绪,但目前,我无法通过这个错误。这是错误:

D:\WPy-3702\python-3.7.0\lib\site-packages\sklearn\utils\validation.py:578: 
DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-5775276c3452> in <module>()
----> 1 clf_positive = MultinomialNB().fit(Xtrain_tfidf, ywords_pos)
      2 clf_negative = MultinomialNB().fit(Xtrain_tfidf, ywords_neg)
      3 clf_bad = MultinomialNB().fit(Xtrain_tfidf, ywords_bad)

D:\WPy-3702\python-3.7.0\lib\site-packages\sklearn\naive_bayes.py in fit(self, X, y, sample_weight)
    581 
    582         labelbin = LabelBinarizer()
--> 583         Y = labelbin.fit_transform(y)
    584         self.classes_ = labelbin.classes_
    585         if Y.shape[1] == 1:

D:\WPy-3702\python-3.7.0\lib\site-packages\sklearn\preprocessing\label.py in fit_transform(self, y)
    305             Shape will be [n_samples, 1] for binary problems.
    306         """
--> 307         return self.fit(y).transform(y)
    308 
    309     def transform(self, y):

D:\WPy-3702\python-3.7.0\lib\site-packages\sklearn\preprocessing\label.py in fit(self, y)
    274         self : returns an instance of self.
    275         """
--> 276         self.y_type_ = type_of_target(y)
    277         if 'multioutput' in self.y_type_:
    278             raise ValueError("Multioutput target data is not supported with "

D:\WPy-3702\python-3.7.0\lib\site-packages\sklearn\utils\multiclass.py in type_of_target(y)
    285         return 'continuous' + suffix
    286 
--> 287     if (len(np.unique(y)) > 2) or (y.ndim >= 2 and len(y[0]) > 1):
    288         return 'multiclass' + suffix  # [1, 2, 3] or [[1., 2., 3]] or [[1, 2]]
    289     else:

D:\WPy-3702\python-3.7.0\lib\site-packages\numpy\lib\arraysetops.py in unique(ar, return_index, return_inverse, return_counts, axis)
    221     ar = np.asanyarray(ar)
    222     if axis is None:
--> 223         return _unique1d(ar, return_index, return_inverse, return_counts)
    224     if not (-ar.ndim <= axis < ar.ndim):
    225         raise ValueError('Invalid axis kwarg specified for unique')

D:\WPy-3702\python-3.7.0\lib\site-packages\numpy\lib\arraysetops.py in _unique1d(ar, return_index, return_inverse, return_counts)
    281         aux = ar[perm]
    282     else:
--> 283         ar.sort()
    284         aux = ar
    285     flag = np.concatenate(([True], aux[1:] != aux[:-1]))

TypeError: '<' not supported between instances of 'int' and 'str'
python machine-learning scikit-learn naivebayes
1个回答
0
投票

您正在使用CountVectorizer来矢量化您的数据,然后将该结果用于TfIdfVectorizer。你不能为TfIdfVectorizer提供整数数据。如果你想使用TfIdfVectorizer,那么直接在你的文字上使用它。 CountVectorizerTfIdfVectorizer是两种不同的方法来矢量化文本数据,可以输入到您的模型。我建议你阅读那些了解它的作用。

希望能帮助到你!

© www.soinside.com 2019 - 2024. All rights reserved.