创建新的 pandas 列,其他列包含字符串

问题描述 投票:0回答:2

我有:

df = pd.DataFrame({'col1': {0: 'success', 1: 'failed', 2: 'variable x 10', 3: 'variable xr 134', 4: 'error', 5: 'not found'}})

我想根据主列中是否存在子字符串来创建 3 列。

  • 如果该列包含单词
    variable
    ,那么我想拆分字符串并获取第二个元素
  • 如果该列包含单词
    variable
    ,那么我想拆分字符串并获取第三个元素。这个值也可以是字符串
  • 如果该列包含以下关键字之一:
    success
    fail
    error
    not found
    ,那么我想放置关键字
    Success
    Fail
    Error
  • 如果没有任何关键字存在,那么我希望出现
    None
    ''

我尝试过:

df['var']=np.where(df['col1'].str.contains('variable'), df['col1'].str.split(' '),'')
df['val']=np.where(df['col1'].str.contains('variable'), df['col1'].str.split(' '),'')
df['other']=np.where(df['col1'].str.contains('not found'|'error'),'Error',
                     np.where(df['col1'].str.contains('success'),'Success',
                              np.where(df['col1'].str.contains('fail'),'Fail','')))

但是我收到错误

unsupported operand type(s) for |: 'str' and 'str'
str.contains
并且无法选择分割字符串的特定部分,因为我在尝试
 operands could not be broadcast together with shapes (6,) (3,) ()
 时收到错误 
df['val']=np.where(df['col1'].str.contains('variable'), df['col1'].str.split(' ')[2],'')

如图所示,这就是我想要的最终结果

有什么建议吗?

pandas contains
2个回答
0
投票

我会

extract
三个部分中的每一个(如果有的话),然后
concat
整个事情:

others = {
    "success": "Success", "fail": "Fail",
    "error": "Error", "not found": "Error"
}

pat_vax = r"variable\s+(?P<var>\S+)\s+(?P<val>\S+)"
pat_oth = r"(%s)" % "|".join(others)

out = (
    pd.concat([
        df, df["col1"].str.extract(pat_vax),
        df["col1"].str.extract(pat_oth, expand=False)
            .map(others).rename("other")], axis=1
    )
)

输出:

print(out)

               col1  var   val    other
0           success  NaN   NaN  Success
1            failed  NaN   NaN     Fail
2     variable x 10    x    10      NaN
3  variable xr 134r   xr  134r      NaN
4             error  NaN   NaN    Error
5         not found  NaN   NaN    Error

[6 rows x 4 columns]

0
投票

您可以使用

replace
创建类似于 csv 文件的列,然后提取字段:

dmap = {
    'success': ',,Success',
    'failed': ',,Fail',
    'error': ',,Error',
    'not found': ',,Error',
    'variable\s+([^\s]+)\s+(.*)': r'\1,\2,',
    '^(?!success|failed|error|not found|variable).*$': ',,'
}

cols = (df['col1'].replace(dmap, regex=True)
                  .str.extract(r'(?P<var>.*),(?P<val>.*),(?P<other>.*)'))
out = pd.concat([df, cols], axis=1)

输出:

>>> out
               col1 var   val    other
0           success            Success
1            failed               Fail
2     variable x 10   x    10         
3  variable xr 134r  xr  134r         
4             error              Error
5         not found              Error
6           nothing                   
© www.soinside.com 2019 - 2024. All rights reserved.