打印名称作为名字和姓氏格式

问题描述 投票:0回答:3

我有一个文本文件,其中包含以下数据:

Last name, First name in some of the cases

例如:

The patient was referred by Dr. Douglas, John, updated by: ‎Acosta, Christina
The patient was referred by Potter, Rob,M.D.
Sam was referred by Dr. Alisa Russo

我希望输出为:

John Douglas
Rob Potter
Alisa Russo

我使用的代码如下:

print(str(string.partition(',')[2].split()[0] +" "+string.partition(',')[0].split()[0]))
regex python-3.x spacy data-extraction
3个回答
1
投票

您可以先找到名称,前面有“Dr.”或者后跟“M.D.”,然后在输出名称时,如果有逗号,则交换名称的顺序:

import re
data = '''The patient was referred by Dr. Douglas, John, updated by: ‎Acosta, Christina
The patient was referred by Potter, Rob,M.D.
Sam was referred by Dr. Alisa Russo'''
for name in re.findall(r"(?<=Dr. ){0}|{0}(?=,\s*M.D.)".format("[a-z'-]+,? [a-z'-]+"), data, re.IGNORECASE):
    print(' '.join(name.split(', ')[::-1]) if ', ' in name else name)

这输出:

John Douglas
Rob Potter
Alisa Russo

1
投票

第一个挑战是获取医生的名字和姓氏。这很难,因为有些名字很毛茸茸。具有一些替换的正则表达式可以帮助,例如,

(?:Dr. )(\w+) (\w+)|(?:Dr. )(\w+), (\w+)|(\w+), (\w+),?(?: ?M\.?D\.?)

Demo

Code Sample

import re

regex = r"(?:Dr. )(\w+) (\w+)|(?:Dr. )(\w+), (\w+)|(\w+), (\w+),?(?: ?M\.?D\.?)"

test_str = ("The patient was referred by Dr. Douglas, John, updated by: ‎Acosta, Christina\n"
    "The patient was referred by Potter, Rob,M.D.\n"
    "Sam was referred by Dr. Alisa Russo")

matches = re.finditer(regex, test_str, re.MULTILINE)
results = []

for match in matches:
    if match.group(1):
        results.append([match.group(1), match.group(2)])
        next
    if match.group(3):
        results.append([match.group(4), match.group(3)])            
        next
    if match.group(5):
        results.append([match.group(6), match.group(5)])
        next

输出是列表列表。然后,打印变得非常容易。

[['John', 'Douglas'], ['Rob', 'Potter'], ['Alisa', 'Russo']]

0
投票

老实说,我首先要抓住这些名字。使用正则表达式...一旦你得到它,然后根据','切换名字/姓氏。不要一次完成所有操作。

© www.soinside.com 2019 - 2024. All rights reserved.