在Python中使用正则表达式替换换行符...

Question

Python新手，使用3.5。我觉得这个问题与这里提出的其他问题类似，但是尽管已经阅读了这些问题并尝试遵循给出的建议，但我仍然没有在这个正则表达式上取得任何进展。

我有一串文本，我想用空格替换所有后面不跟另一个换行符或三个空格的换行符。我正在尝试使用带有负向前瞻的正则表达式来执行此操作。我从此对话中了解到我需要使用多行。不过，我的正则表达式仍然无法识别字符串中的任何内容。基本上，我想匹配并替换位于下面字符串的中间，而字符串开头和结尾的部分保持不变。

body = 'foo foo\r\n\xa0\xa0\xa0foo foo foo\r\n\foo foo foo foo foo\r\n\r\n\foo foo foo'

breakRegex = re.compile(r'(\r\n)?!(\r\n)|(\r\n)?!(\s\s\s)', s,re.M)

breakRegex.sub(' ', body)

期望且迄今为止尚未达到的结果是：

'foo foo\r\n\xa0\xa0\xa0foo foo foo foo foo foo foo foo\r\n\r\n\foo foo foo'

我也尝试了上面的方法，没有那么多括号，用 \s 代替 \xa0 等，但它仍然不起作用......感谢您提供的任何帮助。

Answer 1

这是你想要的吗？

break_regex = re.compile(r'\r\n(?!=\r\n|\s\s\s)', re.M)

所有换行符
\r\n
，后面没有
(?!=...)
，要么 (
|
), 另一个换行符
\r\n
，或三个空格
\s\s\s
。

编辑：

抱歉，我犯了一个错误，您应该尽快删除正则表达式中的
```
=
```
。 :)
你是这个意思吗？:

body = 'foo foo \xa0\xa0\xa0foo foo foo foo foo foo foo foo 富富富'

代替：

body = 'foo foo \xa0\xa0\xa0foo foo foo oo foo foo foo foo 呜呜呜'`

因为

\f

表示换页 (

0x0c

)。

Answer 2

def clean_with_puncutation(text):    
    from string import punctuation
    import re
    punctuation_token={p:'<PUNC_'+p+'>' for p in punctuation}
    punctuation_token['<br/>']="<TOKEN_BL>"
    punctuation_token['\n']="<TOKEN_NL>"
    punctuation_token['<EOF>']='<TOKEN_EOF>'
    punctuation_token['<SOF>']='<TOKEN_SOF>'
  #punctuation_token



    regex = r"(<br/>)|(<EOF>)|(<SOF>)|[\n\!\@\#\$\%\^\&\*\(\)\[\]\
           {\}\;\:\,\.\/\?\|\`\_\\+\\\=\~\-\<\>]"

###Always put new sequence token at front to avoid overlapping results
 #text = '<EOF>!@#$%^&*()[]{};:,./<>?\|`~-= _+\<br/>\n <SOF>\ '
    text_=""

    matches = re.finditer(regex, text)

    index=0

    for match in matches:
     #print(match.group())
     #print(punctuation_token[match.group()])
     #print ("Match at index: %s, %s" % (match.start(), match.end()))
        text_=text_+ text[index:match.start()] +" " 
              +punctuation_token[match.group()]+ " "
        index=match.end()
    return text_

Answer 3

这里是需要转义

\n

符号的 VSCode。那么你需要更换：

\\r\\n(?!\\r\\n|\s\s\s|\\xa0\\xa0\\xa0)

与：

.

这仍然会标记右侧最后一个

\r\n

：

这就是为什么你需要在前面加上负面的看法：

(?<!\\r\\n)\\r\\n(?!\\r\\n|\s\s\s|\\xa0\\xa0\\xa0)

如果你不想连续写三次相同的内容，你需要非捕获组

(?:...)

：

（？

在Python中使用正则表达式替换换行符...

问题描述投票：0回答：3

3个回答

最新问题

在Python中使用正则表达式替换换行符...

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3