我有一个代表成绩单的文本文件。我需要找到一种方法来拆分它,以便我有一个表示每个人所说的字符串的列表。所以这;
mystr = '''Bob: Hello there, how are you?
Alice: I am fine how are you?'''
变成这个;
mylist= ['Bob: Hello there, how are you?','Alice: I am fine how are you?']
我是正则表达式的新手但是认识到这可能是要走的路。问题是我想在名称不同的情况下(例如John,Paul,George,Ringo等)对许多成绩单进行迭代。一致的是存在一个单词(代表说话者),然后是冒号,然后是白色空格。
re.findall(r"\S[^:]+.*", mystr)
#-> ['Bob: Hello there, how are you? ', 'Alice: I am fine how are you?']
import re
mystr = '''Bob: Hello there, how are you?
Alice: I am fine how are you?'''
[_.group(0).strip() for _ in re.finditer(r"\w[^:]+.*", mystr)]
#['Bob: Hello there, how are you?', 'Alice: I am fine how are you?']
如果冒号不存在,那么这个正则表达式应该优于前一个正则表达式。
mystr = '''Bob Hello there, how are you?
Alice: I am fine how are you?'''
[_.group(0).strip() for _ in re.finditer(r"\w{1,}:+.*", mystr)]
#['Alice: I am fine how are you?']