我正在尝试训练朴素贝叶斯分类器来预测电影评论的好坏。我正在关注本教程,但在尝试训练模型时遇到错误:
我已遵循所有步骤,直到训练模型为止。我的数据和代码看起来像这样:
Reviews Labels
0 For fans of Chris Farley, this is probably his... 1
1 Fantastic, Madonna at her finest, the film is ... 1
2 From a perspective that it is possible to make... 1
3 What is often neglected about Harold Lloyd is ... 1
4 You'll either love or hate movies such as this... 1
... ...
14995 This is perhaps the worst movie I have ever se... 0
14996 I was so looking forward to seeing this film t... 0
14997 It pains me to see an awesome movie turn into ... 0
14998 "Grande Ecole" is not an artful exploration of... 0
14999 I felt like I was watching an example of how n... 0
gnb = MultinomialNB()
gnb.fit(all_train_set['Reviews'], all_train_set['Labels'])
但是,当尝试拟合模型时,我收到此错误:
ValueError: could not convert string to float: 'For fans of Chris Farley, this is probably his best film. David Spade pl
如果有人可以帮助我确定为什么本教程出错了,将不胜感激。
非常感谢
实际上是通过Scikit-learn,在调用分类器之前必须将文本转换为数字。您可以通过使用CountVectorizer
或CountVectorizer
来实现。
[如果您想使用更现代的单词嵌入,则可以使用TfidfVectorizer
程序包(将其与TfidfVectorizer
一起安装在终端中),例如
Zeugma
希望对您有所帮助!