如何在gensim中删除停止符？

Question

df_clean['message'] = df_clean['message'].apply(lambda x: gensim.parsing.preprocessing.remove_stopwords(x))

我试着在数据框的列 "message "上这样做，但得到的是错误信息。

TypeError: decoding to str: need a bytes-like object, list found

Answer 1

很明显， df_clean["message"] 列中包含了一个单词列表，而不是一个字符串，因此出现了错误，称 need a bytes-like object, list found.

要解决这个问题，你需要用以下方法再次将其转换为字符串。join() 这样的方法。

df_clean['message'] = df_clean['message'].apply(lambda x: gensim.parsing.preprocessing.remove_stopwords(" ".join(x)))

注意 df_clean["message"] 应用前面的代码后，将包含字符串对象。

Answer 2

这不是一个 gensim 问题，该错误是由 pandas说明：您的列中有一个值 message 属于 list 而不是 string. 这是一个最小的 pandas 例。

import pandas as pd
from gensim.parsing.preprocessing import remove_stopwords
df = pd.DataFrame([['one', 'two'], ['three', ['four']]], columns=['A', 'B'])
df.A.apply(remove_stopwords) # works fine

df.B.apply(remove_stopwords)

TypeError: decoding to str: need a bytes-like object, list found

Answer 3

错误的意思是： 删除停止语 需要绳子类型的对象，并且您传递的是一个列表所以，在删除之前语塞检查列中的所有值是否为字符串类型。参见文档

如何在gensim中删除停止符？

问题描述投票：0回答：2

2个回答

最新问题

如何在gensim中删除停止符？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2