我正在尝试将每行包含多个单引号字符串的 csv 更新为将这些字符串替换为文字的 csv。但它将所有数据放在输出的第一行。有人可以建议下面的代码有什么问题吗:
import pandas as pd
import re
df=pd.read_csv("t1.csv");
col1=df['col1']
col2=re.sub(r'\'([^\']*)\'','const',str(col1))
col3 = pd.Series(col2)
df['col1']=col3
df.to_csv('t_u.csv')
exit()
文件 t1.csv 具有如下数据:
col1
This one has 'many' 'such' 'quotes' in it.
Now it does not.
But 'this' 'one' does 'have' it 'too'.
生成的输出具有如下数据...这是错误的,因为它只有一行:
col1
0 "0 This one has const const const in it.
1 Now it does not.
2 But const const does const it const.
Name: col1, dtype: object"
1
2
这里发生的事情是,所有 3 行在最终输出中都合并为一行,而我希望结果 csv 的输出具有相同的格式 - 3 行并进行所需的更改。
str.replace
与正则表达式一起使用:
df['col1'] = df['col1'].str.replace(r'\'([^\']*)\'', 'const', regex=True)
输出:
0 This one has const const const in it.
1 Now it does not.
2 But const const does const it const.
Name: col1, dtype: object