从dataframe列中删除无意义的单词

Question

数据框列包含几乎没有三个和两个字母单词的句子。我想在dataframe列中找到所有这些单词，然后从dataframe列中删除它们。东菱

id      text
1       happy birthday syz
2       vz
3       have a good bne weekend

我想1）找到长度小于3的所有单词。（这将返回syz，vz，bne）2）删除这些单词（请注意，停用词已被删除所以像“a”，“the”aren这样的单词t现在存在于dataframe列中，上面的数据帧只是一个例子）

我尝试了下面的代码，但它不起作用

def word_length(text):
    words = []
    for word in text:
        if len(word) <= 3:
            words.append(word)
    return(words)

short_words = df['text'].apply(word_length).sum()

输出应该是 -

id      text
1       happy birthday 
2       
3       have good weekend

Answer 1

您将函数应用于一列单词序列，而实际数据是字符串列（符号序列）您还应该删除.sum（），因为它完全是冗余的。

重写您在表单中应用的功能：

 def filter_short_words(text):
    return "".join([for w in text.split() if len(w) > 3])

这有效。

从dataframe列中删除无意义的单词

问题描述投票：0回答：1

1个回答

最新问题

从dataframe列中删除无意义的单词

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1