创建新的 pandas 列，其他列包含字符串

Question

我有：

df = pd.DataFrame({'col1': {0: 'success', 1: 'failed', 2: 'variable x 10', 3: 'variable xr 134', 4: 'error', 5: 'not found'}})

我想根据主列中是否存在子字符串来创建 3 列。

如果该列包含单词
```
variable
```
，那么我想拆分字符串并获取第二个元素
如果该列包含单词
```
variable
```
，那么我想拆分字符串并获取第三个元素。这个值也可以是字符串
如果该列包含以下关键字之一：
```
success
```
、
```
fail
```
、
```
error
```
或
```
not found
```
，那么我想放置关键字
```
Success
```
、
```
Fail
```
或
```
Error
```
。
如果没有任何关键字存在，那么我希望出现
```
None
```
或
```
''
```

我尝试过：

df['var']=np.where(df['col1'].str.contains('variable'), df['col1'].str.split(' '),'')
df['val']=np.where(df['col1'].str.contains('variable'), df['col1'].str.split(' '),'')
df['other']=np.where(df['col1'].str.contains('not found'|'error'),'Error',
                     np.where(df['col1'].str.contains('success'),'Success',
                              np.where(df['col1'].str.contains('fail'),'Fail','')))

但是我收到错误

unsupported operand type(s) for |: 'str' and 'str'

与

str.contains

并且无法选择分割字符串的特定部分，因为我在尝试

 operands could not be broadcast together with shapes (6,) (3,) ()

 时收到错误

df['val']=np.where(df['col1'].str.contains('variable'), df['col1'].str.split(' ')[2],'')

如图所示，这就是我想要的最终结果

有什么建议吗？

Answer 1

我会

extract

三个部分中的每一个（如果有的话），然后

concat

整个事情:

others = {
    "success": "Success", "fail": "Fail",
    "error": "Error", "not found": "Error"
}

pat_vax = r"variable\s+(?P<var>\S+)\s+(?P<val>\S+)"
pat_oth = r"(%s)" % "|".join(others)

out = (
    pd.concat([
        df, df["col1"].str.extract(pat_vax),
        df["col1"].str.extract(pat_oth, expand=False)
            .map(others).rename("other")], axis=1
    )
)

输出：

print(out)

               col1  var   val    other
0           success  NaN   NaN  Success
1            failed  NaN   NaN     Fail
2     variable x 10    x    10      NaN
3  variable xr 134r   xr  134r      NaN
4             error  NaN   NaN    Error
5         not found  NaN   NaN    Error

[6 rows x 4 columns]

Answer 2

您可以使用

replace

创建类似于 csv 文件的列，然后提取字段：

dmap = {
    'success': ',,Success',
    'failed': ',,Fail',
    'error': ',,Error',
    'not found': ',,Error',
    'variable\s+([^\s]+)\s+(.*)': r'\1,\2,',
    '^(?!success|failed|error|not found|variable).*$': ',,'
}

cols = (df['col1'].replace(dmap, regex=True)
                  .str.extract(r'(?P<var>.*),(?P<val>.*),(?P<other>.*)'))
out = pd.concat([df, cols], axis=1)

输出：

>>> out
               col1 var   val    other
0           success            Success
1            failed               Fail
2     variable x 10   x    10         
3  variable xr 134r  xr  134r         
4             error              Error
5         not found              Error
6           nothing

创建新的 pandas 列，其他列包含字符串

问题描述投票：0回答：2

2个回答

最新问题

创建新的 pandas 列，其他列包含字符串

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2