数据看起来像这样:
data_clean2.head(3)
text target
0 [deed, reason, earthquak, may, allah, forgiv, u] 1
1 [forest, fire, near, la, rong, sask, canada] 1
2 [resid, ask, shelter, place, notifi, offic, evacu, shelter, place, order, expect] 1
我通过在句子之前加上词干和词根限制并在其上进行标记化来获得此标记。 (希望是对的。)>
现在我要使用:
vectorizer = TfidfVectorizer() vectors = vectorizer.fit_transform(data_clean2['text'])
它给我以下错误:
AttributeError Traceback (most recent call last) <ipython-input-140-6f68d1115c5f> in <module> 1 vectorizer = TfidfVectorizer() ----> 2 vectors = vectorizer.fit_transform(data_clean2['text']) ~\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in fit_transform(self, raw_documents, y) 1650 """ 1651 self._check_params() -> 1652 X = super().fit_transform(raw_documents) 1653 self._tfidf.fit(X) 1654 # X is already a transformed view of raw_documents so ~\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in fit_transform(self, raw_documents, y) 1056 1057 vocabulary, X = self._count_vocab(raw_documents, -> 1058 self.fixed_vocabulary_) 1059 1060 if self.binary: ~\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in _count_vocab(self, raw_documents, fixed_vocab) 968 for doc in raw_documents: 969 feature_counter = {} --> 970 for feature in analyze(doc): 971 try: 972 feature_idx = vocabulary[feature] ~\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in <lambda>(doc) 350 tokenize) 351 return lambda doc: self._word_ngrams( --> 352 tokenize(preprocess(self.decode(doc))), stop_words) 353 354 else: ~\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in <lambda>(x) 254 255 if self.lowercase: --> 256 return lambda x: strip_accents(x.lower()) 257 else: 258 return strip_accents AttributeError: 'list' object has no attribute 'lower'
我知道我无法以某种方式在列表上使用它,所以我在这里扮演什么角色,试图再次将列表返回到字符串中?
[数据看起来像这样:data_clean2.head(3)文本目标0 [行为,原因,地震,可能,安拉,宽恕,你] 1 1 [森林,大火,附近,拉,荣,萨斯省,加拿大] 1 2 [残留物,询问,住所,...
是,首先使用以下方法转换为string
: