如何更换一个空白，但不是所有的空格一段时间？

Question

我如何可以替换一个空白，但不是所有的时间段中的特定时段？

例如：

this_string = 'Man is weak.So they die'
that_string = 'I have a Ph.d'

在这里，我想有这样的结果：

this_string = 'Man is weak So they die'
some_string = 'I have a Phd'

我要像博士头衔保持为一个字，同时连接2句段用空格来代替。

这是我到目前为止有：

re.sub('[^A-Za-z0-9\s]+',' ', this_string)

这将取代有空间的所有时段。

任何想法如何改进呢？

Answer 1

您可以使用两个正则表达式作为规则更改文本：

import re

text = 'Man is weak.So they die. I have a Ph.d.'

text = re.sub(r'([A-Za-z ]{1})(\.)([A-Z]{1})', r'\g<1>. \g<3>', text)  # remove the dot in r'\g<1>. \g<3>' to get '...weak So...'
print(text)  # Man is weak. So they die. I have a Ph.d.

text = re.sub(r'([A-Za-z ]{1})(\.)([a-z]{1})', r'\g<1>\g<3>', text)
print(text)  # Man is weak. So they die. I have a Phd.

最后，它不是强大的，因为它是一个基于规则的转变。像Ph.D东西是行不通的。

Answer 2

你可以先用一个新的符号，比这个符号后面分裂更换有问题的所有点：

import re

abbreviations = ["Dr.", "Mrs.", "Mr.", "Ph.d"]
rx = re.compile(r'''(?:{})|((?<=[a-z])\.(?=\s*[A-Z]))'''.format("|".join(abbreviations)))

data = "Man is weak.So they die. I have a Ph.d"

# substitute them first
def repl(match):
    if match.group(1) is not None:
        return "#!#"
    return match.group(0)

data = rx.sub(repl, data)
for sent in re.split(r"#!#\s*", data):
    print(sent.replace(".", ""))

这产生

Man is weak
So they die
I have a Phd

见a demo on ideone.com。

如何更换一个空白，但不是所有的空格一段时间？

问题描述投票：2回答：2

2个回答

最新问题

如何更换一个空白，但不是所有的空格一段时间？

问题描述 投票：2回答：2

2个回答

最新问题

问题描述投票：2回答：2