在混乱的字符串中插入点/点以在Python中进行文本分析

问题描述 投票:0回答:1

得到一个又长又乱的字符串,缺乏句子结构,即字符串并不始终包含点/点

因此,我目前无法 将长串分解成句子,这是我的文本分析所需要的

以下示例最好地描述了我所获得的内容以及我需要的输出。

example_string = "Football is the world's most popular sport Played on rectangular fields, two teams of eleven players each compete to score goals One of the most famous teams is Real Madrid."

output_string = "Football is the world's most popular sport. Played on rectangular fields, two teams of eleven players each compete to score goals. One of the most famous teams is Real Madrid."

我首先考虑在小写单词和大写单词之间没有任何点时放置一个点/点。 但是,鉴于某些单词,尤其是名称可能以大写字母开头,我会错误地添加点/点(例如,在示例中,我会在“Real Madrid”之前添加点/点)

如有任何帮助,我们将不胜感激。谢谢!

python nlp sentence
1个回答
0
投票

也许您可以使用正则表达式来查找小写单词后跟大写字母

import re
example_string = "Football is the world's most popular sport Played on rectangular fields, two teams of eleven players each compete to score goals One of the most famous teams is Real Madrid."
pattern = re.compile(r'(?<=[a-z])\s+([A-Z])')
output_string = re.sub(pattern, r'. \1', example_string )
print output_string

打印:>>> 足球是世界上最受欢迎的运动。在长方形场地上进行比赛,两队各有 11 名球员,每队争夺进球。最著名的球队之一是。真实的。马德里。

© www.soinside.com 2019 - 2024. All rights reserved.