Python:删除少于 4 个字符的项目

问题描述 投票:0回答:2

我有一个如下所示的数据框:

我需要从 CityIds 列中删除少于 4 个字符的项目。逗号后可以有空格,因为 Items 下有数千个元素。

CityIds
98765, 98-oki, th6, iuy89, 8.90765
89ol, gh98.0p, klopi, th, loip
98087,PAKJIYT, hju, yu8oi, iupli

例如:我想去掉th6或者在一个单独的列中显示th6

python pandas character items
2个回答
0
投票

提取并连接回那些长度等于或大于

4
的所需项目:

df['CityIds'] = df['CityIds'].str.findall(r'([^\s,]{4,})').str.join(', ')

                         CityIds
0  98765, 98-oki, iuy89, 8.90765
1     89ol, gh98.0p, klopi, loip
2   98087, PAKJIYT, yu8oi, iupli

0
投票

上面的答案显然更干净;但是,我在这里为排除的 ID 添加了一个新列:

import pandas as pd

d = {'cityIDs': ['98765, 98-oki, th6, iuy89, 8.90765',
                 '89ol, gh98.0p, klopi, th, loip',
                 '98087, PAKJIYT, hju, yu8oi, iupli']}
df = pd.DataFrame(data=d)
n = len(df['cityIDs'])
df['rmvdIDs'] = ['' for _ in range(n)]
for i in range(n):
    row = df['cityIDs'][i]
    cityIDs = "".join(row.split()).split(',')
    new_IDs = [i for i in cityIDs if len(i) >= 4]
    excl_IDs = list(set(cityIDs) - set(new_IDs))
    new_row = ", ".join(new_IDs)
    excl_row = ", ".join(excl_IDs)
    df['cityIDs'][i] = new_row
    df['rmvdIDs'][i] = excl_row

print(df)

会回来:

                         cityIDs rmvdIDs
0  98765, 98-oki, iuy89, 8.90765     th6
1     89ol, gh98.0p, klopi, loip      th
2   98087, PAKJIYT, yu8oi, iupli     hju

-- 希望这有帮助

© www.soinside.com 2019 - 2024. All rights reserved.