Pandas 检查前缀和更多校验和(如果搜索到的前缀存在或没有数据)

问题描述 投票:0回答:2

我有下面的代码片段,效果很好。

import pandas as pd
import numpy as np

prefixes = ['sj00', 'sj12', 'cr00', 'cr08', 'eu00', 'eu50']
df = pd.read_csv('new_hosts', index_col=False, header=None)
df['prefix'] = df[0].str[:4]
df['grp'] = df.groupby('prefix').cumcount()
df = df.pivot(index='grp', columns='prefix', values=0)
df['sj12'] = df['sj12'].str.extract('(\w{2}\d{2}\w\*)', expand=True)
df = df[ prefixes ].dropna(axis=0, how='all').replace(np.nan, '', regex=True)
df = df.rename_axis(None)

示例文件 new_hosts

sj000001
sj000002
sj000003
sj000004
sj124000
sj125000
sj126000
sj127000
sj128000
sj129000
sj130000
sj131000
sj132000
cr000011
cr000012
cr000013
cr000014
crn00001
crn00002
crn00003
crn00004
euk000011
eu0000012
eu0000013
eu0000014
eu5000011
eu5000013
eu5000014
eu5000015

电流输出:

sj00        sj12        cr00        cr08        eu00        eu50
sj000001                cr000011    crn00001    euk000011   eu5000011
sj000002                cr000012    crn00002    eu0000012   eu5000013
sj000003                cr000013    crn00003    eu0000013   eu5000014
sj000004                cr000014    crn00004    eu0000014   eu5000015

期望什么:

  1. 由于代码工作正常,但正如您所看到的

    current output
    ,第二列没有任何值但仍然出现所以,如果特定列没有任何值,我怎么可能有校验和,然后从显示中删除它?

  2. 我们可以在处理之前检查

    prefixes
    是否存在于数据框中以避免错误吗?

python-3.x pandas numpy group-by
2个回答
1
投票

IIUC,之前

df = df[ prefixes ].dropna(axis=0, how='all').replace(np.nan, '', regex=True)

你可以这样做:

# remove all empty columns
df = df.dropna(axis=1, how='all')

这将解决你的第一部分。第二部分可以是

reindex

# select prefixes:
prefixes = ['sj00', 'sj12', 'cr00', 'cr08', 'eu00', 'eu50', 'sh00', 'dt00', 'sh00', 'dt00']

df = df.reindex(prefixes, axis=1).dropna(axis=1, how='all').replace(np.nan, '', regex=True)

注意

axis=1
,而不是
axis=0
与我对问题 1 的建议相同。


0
投票

非常感谢 Quang Hoang 在帖子中的提示,只是为了解决方法,我按如下方式工作,直到得到更好的答案:

# Select prefixes
prefixes = ['sj00', 'sj12', 'cr00', 'cr08', 'eu00', 'eu50']

df = pd.read_csv('new_hosts', index_col=False, header=None)

df['prefix'] = df[0].str[:4]

df['grp'] = df.groupby('prefix').cumcount()

df = df.pivot(index='grp', columns='prefix', values=0)

df = df[prefixes]

# For column `sj12` only extract the values having `sj12` and a should be a word immediately after that like `sj12[a-z]`
df['sj12'] = df['sj12'].str.extract('(\w{2}\d{2}\w\*)', expand=True)

df.replace('', np.nan, inplace=True)

# Remove the empty columns
df = df.dropna(axis=1, how='all')

# again drop if all values in the row are nan and replace nan to empty for live columns
df = df.dropna(axis=0, how='all').replace(np.nan, '', regex=True)

# drop the index field
df = df.rename_axis(None)

print(df)
© www.soinside.com 2019 - 2024. All rights reserved.