如何对数据框列上的多个短语应用正则表达式?

问题描述 投票:0回答:1

你好,我有一个数据框,我想从以这些短语开头或包含这些短语的每一行中删除一组特定的字符'fwd','re','RE'。我面临的问题是我不知道如何为每种情况应用正则表达式。

我的数据框看起来像这样:

      summary 
0 Fwd: Please look at the attached documents and take action 
1 NSN for the ones who care
2 News for all team members 
3 Fwd:RE:Re: Please take action on the action needed items 
4 Fix all the mistakes please 
5 Fwd:Re: Take action on the attachments in this email 
6 Fwd:RE: Action is required 

我想要这样的结果数据框:

          summary 
0 Please look at the attached documents and take action 
1 NSN for the ones who care
2 News for all team members 
3 Please take action on the action needed items 
4 Fix all the mistakes please 
5 Take action on the attachments in this email 
6 Action is required 

为了摆脱'Fwd',我使用了df ['msg']。str.replace(r'^ Fwd:','')

python regex dataframe
1个回答
0
投票

如果它们可以在字符串中的任何位置,则可以使用重复模式并以单词边界开头

\b(?:(?:Fwd|R[eE]):)+\s*

Regex demo

在替换中,使用空字符串。

© www.soinside.com 2019 - 2024. All rights reserved.