我正在使用地址解析库,该库以以下方式接受字符串
import pyap
test_address = """
4998 Stairstep Lane Toronto ON
"""
addresses = pyap.parse(test_address, country='CA')
for address in addresses:
# shows found address
print(address)
# shows address parts
print(address.as_dict())
我想在单个熊猫数据框列的每一行上使用此功能。数据框包含两列(id,address),这是我到目前为止所拥有的
addresses.apply(lambda x: pyap.parse(x['address'], country='CA'),axis=1)
尽管运行,但会生成一个序列,而不是'pyap.address.Address'
您必须做您想做的事,但相反:假设您的数据框是这样的:
d = [{'id': '1', 'address': '4998 Stairstep Lane Toronto ON'}, {'id': '2', 'address': '1234 Stairwell Road Toronto ON'}]
df = pd.DataFrame(d)
df
id address
0 1 4998 Stairstep Lane Toronto ON
1 2 1234 Stairwell Road Toronto ON
将这些地址提取到列表中
address_list = df['address'].tolist()
然后使用pyapp处理每个:
for al in address_list:
addresses = pyap.parse(al, country='CA')
for address in addresses:
print(address)
print(address.as_dict())
让我知道它是否有效。