[我有以下Python脚本(在Jupyter中),该脚本应使用正则表达式提取地址信息(在此步骤之前,已经清除了单位编号并缩写了街道类型):
type_opts = r"Terrace|Way|Walk|St|Rd|Ave|Cl|Ct|Cres|Blvd|Dr|Ln|Pl|Sq|Pde"
road_attrs_pattern = r"(?P<rd_no>\w?\d+(\-\d+)?\w?\s+)(?P<rd_nm>[a-zA-z \-]+)(?#\s+(?P<rd_tp>" + type_opts + ")"
print("Road Attr Pattern: ", road_attrs_pattern)
road_attrs = re.match(road_attrs_pattern, proc_addr)
road_num = road_attrs.group('rd_no').strip()
print("Road number: ", road_num)
road_name = road_attrs.group('rd_nm').strip()
print("Road name: ", road_name)
road_type = road_attrs.group('rd_tp').strip()
print("Road type: ", road_type)
我正在使用此地址:
Burrah lodge, 15 Anne Jameson Pl
这将导致以下打印输出:
Road Attr Pattern: (?P<rd_no>\w?\d+(\-\d+)?\w?\s+)(?P<rd_nm>[a-zA-z \-]+)(?#\s+(?P<rd_tp>Terrace|Way|Walk|St|Rd|Ave|Cl|Ct|Cres|Blvd|Dr|Ln|Pl|Sq|Pde)
但是随后抛出错误,指出街道号不可用AttributeError: 'NoneType' object has no attribute 'group'
。
然而,Regex101 here中的复制粘贴表示它应该可以工作,并且通过查看Regex,我认为它也应该可以工作...
它应该打印出以下内容:
Road Attr Pattern: (?P<rd_no>\w?\d+(\-\d+)?\w?\s+)(?P<rd_nm>[a-zA-z \-]+)(?#\s+(?P<rd_tp>Terrace|Way|Walk|St|Rd|Ave|Cl|Ct|Cres|Blvd|Dr|Ln|Pl|Sq|Pde)
Road number: 15
Road name: Anne Jameson
Road type: Pl