我有:
df = pd.DataFrame({'col1': {0: 'success', 1: 'failed', 2: 'variable x 10', 3: 'variable xr 134', 4: 'error', 5: 'not found'}})
我想根据主列中是否存在子字符串来创建 3 列。
variable
,那么我想拆分字符串并获取第二个元素variable
,那么我想拆分字符串并获取第三个元素。这个值也可以是字符串success
、fail
、error
或not found
,那么我想放置关键字Success
、Fail
或Error
。 None
或 ''
我尝试过:
df['var']=np.where(df['col1'].str.contains('variable'), df['col1'].str.split(' '),'')
df['val']=np.where(df['col1'].str.contains('variable'), df['col1'].str.split(' '),'')
df['other']=np.where(df['col1'].str.contains('not found'|'error'),'Error',
np.where(df['col1'].str.contains('success'),'Success',
np.where(df['col1'].str.contains('fail'),'Fail','')))
但是我收到错误
unsupported operand type(s) for |: 'str' and 'str'
与 str.contains
并且无法选择分割字符串的特定部分,因为我在尝试 operands could not be broadcast together with shapes (6,) (3,) ()
时收到错误
df['val']=np.where(df['col1'].str.contains('variable'), df['col1'].str.split(' ')[2],'')
有什么建议吗?
extract
三个部分中的每一个(如果有的话),然后concat
整个事情:
others = {
"success": "Success", "fail": "Fail",
"error": "Error", "not found": "Error"
}
pat_vax = r"variable\s+(?P<var>\S+)\s+(?P<val>\S+)"
pat_oth = r"(%s)" % "|".join(others)
out = (
pd.concat([
df, df["col1"].str.extract(pat_vax),
df["col1"].str.extract(pat_oth, expand=False)
.map(others).rename("other")], axis=1
)
)
输出:
print(out)
col1 var val other
0 success NaN NaN Success
1 failed NaN NaN Fail
2 variable x 10 x 10 NaN
3 variable xr 134r xr 134r NaN
4 error NaN NaN Error
5 not found NaN NaN Error
[6 rows x 4 columns]
您可以使用
replace
创建类似于 csv 文件的列,然后提取字段:
dmap = {
'success': ',,Success',
'failed': ',,Fail',
'error': ',,Error',
'not found': ',,Error',
'variable\s+([^\s]+)\s+(.*)': r'\1,\2,',
'^(?!success|failed|error|not found|variable).*$': ',,'
}
cols = (df['col1'].replace(dmap, regex=True)
.str.extract(r'(?P<var>.*),(?P<val>.*),(?P<other>.*)'))
out = pd.concat([df, cols], axis=1)
输出:
>>> out
col1 var val other
0 success Success
1 failed Fail
2 variable x 10 x 10
3 variable xr 134r xr 134r
4 error Error
5 not found Error
6 nothing