如果dataframe中的单元格值包含小于5的字符,Python将删除行

问题描述 投票:0回答:3

我有一个数据框,就像我试图保留超过5个字符的行。这是我尝试过的,但它删除了'of','U。','和','Arts'等等。我只需要删除len小于5的行中的字符。

id schools
1  University of Hawaii
2  Dept in Colorado U.
3  Dept
4  College of Arts and Science
5  Dept
6  Bldg

我的代码输出错误:

0    University Hawaii
1             Colorado
2                     
3      College Science
4                     
5   

寻找这样的输出:

id schools
1  University of Hawaii
2  Dept in Colorado U.
4  College of Arts and Science

码:

l = [1,2,3,4,5,6]
s = ['University of Hawaii', 'Dept in Colorado U.','Dept','College of Arts and Science','Dept','Bldg']
df1 = pd.DataFrame({'id':l, 'schools':s})
df1 = df1['schools'].str.findall('\w{5,}').str.join(' ') # not working
df1
python python-3.x
3个回答
2
投票

使用正则表达式对于此任务来说是一个巨大的(并且缓慢的)过度杀伤。您可以使用简单的pandas索引:

filtrered_df = df1[df1['schools'].str.len() > 5]  # or >= depending on the required logic

0
投票

您的数据有一个更简单的过滤器。

 mask = df1['schools'].str.len() > 5

然后从过滤器创建一个新的数据框

df2 = df1[mask].copy()

-1
投票
import pandas as pd
name = ['University of Hawaii','Dept in Colorado U.','Dept','College of Arts and Science','Dept','Bldg']

labels =['schools']
df =pd.DataFrame.from_records([[i] for i in name],columns=labels)
df[df['schools'].str.len() >5 ]
© www.soinside.com 2019 - 2024. All rights reserved.