正则表达式匹配多个前缀,解压缩到列

问题描述 投票:0回答:1

有关正则表达式采取此系列的任何建议

import pandas as pd
import numpy as np

data = [
    'Apple: very tasty',
    'Banana: Unpleasant',
    'Apple: quite nice  Banana: not bad either',
    '',
]

ser = pd.Series(data=data)

enter image description here

进入这个结果DataFrame?

pd.DataFrame(data=[
    ['very tasty', np.nan],
    [np.nan, 'Unpleasant'],
    ['quite nice', 'not bad either'],
    [np.nan, np.nan],
], columns = ['Apple', 'Banana'])

enter image description here

如果Apple和Banana存在,它们总是按照Apple,Banana的顺序排列,并以双倍空格分隔。

python regex pandas
1个回答
1
投票

你可以这样做:

df_out = pd.DataFrame(df.values.reshape(-1,2),
                      index=np.repeat(np.arange(df.shape[0]),df.shape[1]//2))

df_out = pd.DataFrame()

df = ser.str.split(':| \ s \ s',expand = True)

对于n,g在df.groupby中(df.columns // 2,轴= 1):

df_out = pd.concat([df_out,pd.DataFrame(g.values)])

df_out.set_index(0, append=True)[1].unstack().dropna(1, how='all')

输出:

         Apple           Banana
0   very tasty              NaN
1          NaN       Unpleasant
2   quite nice   not bad either
3          NaN              NaN
© www.soinside.com 2019 - 2024. All rights reserved.