基于半一致特征拆分字符串

Question

我有一个代表成绩单的文本文件。我需要找到一种方法来拆分它，以便我有一个表示每个人所说的字符串的列表。所以这;

mystr = '''Bob: Hello there, how are you? 

           Alice: I am fine how are you?'''

变成这个;

mylist= ['Bob: Hello there, how are you?','Alice: I am fine how are you?']

我是正则表达式的新手但是认识到这可能是要走的路。问题是我想在名称不同的情况下（例如John，Paul，George，Ringo等）对许多成绩单进行迭代。一致的是存在一个单词（代表说话者），然后是冒号，然后是白色空格。

Answer 1

re.findall(r"\S[^:]+.*", mystr)
#-> ['Bob: Hello there, how are you? ', 'Alice: I am fine how are you?']

https://docs.python.org/3/library/re.html

Answer 2

import re
mystr = '''Bob: Hello there, how are you? 

           Alice: I am fine how are you?'''
[_.group(0).strip() for _ in re.finditer(r"\w[^:]+.*", mystr)]

#['Bob: Hello there, how are you?', 'Alice: I am fine how are you?']

如果冒号不存在，那么这个正则表达式应该优于前一个正则表达式。

mystr = '''Bob Hello there, how are you? 

           Alice: I am fine how are you?'''
[_.group(0).strip() for _ in re.finditer(r"\w{1,}:+.*", mystr)]
#['Alice: I am fine how are you?']

基于半一致特征拆分字符串

问题描述投票：-1回答：2

2个回答

最新问题

基于半一致特征拆分字符串

问题描述 投票：-1回答：2

2个回答

最新问题

问题描述投票：-1回答：2