在python中获取数据框中的文本长度

问题描述 投票:0回答:1

所以我有这个数据框:

    Text                                             target
    #Coronavirus is a cover for something else. #5...   D
    Crush the One Belt One Road !! \r\n#onebeltonf...   B
    RT @nickmyer: It seems to be, #COVID-19 aka #c...   B
    @Jerusalem_Post All he knows is how to destroy...   B
    @newscomauHQ Its gonna show us all. We will al...   B

Where Text are tweets,我正在尝试获取text列中每个字符串的计数,并将计数输入到数据框中。我已经尝试过了

d = pd.read_csv('5gCoronaFinal.csv')
d['textlength'] = [len(int(t)) for t in d['Text']]

但是它一直给我这个错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-42-dabcab1de7b2> in <module>
----> 1 d['textlength'] = [len(t) for t in d['Text']]

<ipython-input-42-dabcab1de7b2> in <listcomp>(.0)
----> 1 d['textlength'] = [len(t) for t in d['Text']]

TypeError: object of type 'float' has no len()

我曾尝试将t转换为整数,如下所示:

d['textlength'] = [len(int(t)) for t in d['Text']]

但是它给了我这个错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-43-9ae56e5f7912> in <module>
----> 1 d['textlength'] = [len(int(t)) for t in d['Text']]

<ipython-input-43-9ae56e5f7912> in <listcomp>(.0)
----> 1 d['textlength'] = [len(int(t)) for t in d['Text']]

ValueError: invalid literal for int() with base 10: '#Coronavirus is a cover for something else. #5g is being rolled out and they are expecting lots to...what? Die from #60ghz +. They look like they are to keep the cold in? #socialdistancing #covid19 #

我需要一些帮助,谢谢!

python pandas dataframe
1个回答
1
投票

您可以将str访问器用于矢量化字符串操作。在这种情况下,您可以使用str.splitstr.split

str.len

str.len
© www.soinside.com 2019 - 2024. All rights reserved.