如何制作我需要的部分线?

问题描述 投票:-1回答:2

我需要从表列中只提取一行的一部分 - 可以是0到4个字符长:

“地址”: “124”

我知道这可以作为'extract'/ findall函数完成。但事实证明只设置了一个掩模,在该掩模上只有部分线将落在这个掩模之下。正如我所说,代码长度不同,所以这种方法无效。请告诉我如何正确设置选择的掩码。

表列中的示例行:

{'latitude':'37 .80505999961946','human_address':'{“address”:“0”,“city”:“Oakland”,“state”:“Ca”,“zip”:“”}','needs_recoding ':错,'经度':' - 122.27301999967312'}

df['latitude_1'] = df['Location 1'].str.extract('(\"\d\d\d\d)', expand=True)
python regex pandas
2个回答
0
投票

我希望这有帮助

dic = {'latitude': '37.80505999961946', 'human_address': '{"address":"1234","city":"Oakland","state":"Ca","zip":""}', 'needs_recoding': False, 'longitude': '-122.27301999967312'}, {'latitude': '37.80505999961946', 'human_address': '{"address":"0","city":"Oakland","state":"Ca","zip":""}', 'needs_recoding': False, 'longitude': '-122.27301999967312'}
df = pd.DataFrame(list(dic))
df


          human_address                                   latitude             longitude        needs_recoding
0   {"address":"1234","city":"Oakland","state":"Ca...   37.80505999961946   -122.27301999967312 False
1   {"address":"0","city":"Oakland","state":"Ca","...   37.80505999961946   -122.27301999967312 False


import re
df.human_address.apply(lambda s: re.search('\"address\"*:*\"\d{0,4}\"', s).group())


0    "address":"1234"
1       "address":"0"
Name: human_address, dtype: object

0
投票

你可以确实使用pandas str.extract,你只需要调整你的正则表达式模式。

以下是来自@Ananay Mital的数据帧。

>>> df
                                       human_address           latitude            longitude  needs_recoding
0  {"address":"1234","city":"Oakland","state":"Ca...  37.80505999961946  -122.27301999967312           False
1  {"address":"0","city":"Oakland","state":"Ca","...  37.80505999961946  -122.27301999967312           False

这是您使用str.extract获取结果的方法:

>>> df.human_address.str.extract('(\"address\":\"\d{0,4}\")')
                  0
0  "address":"1234"
1     "address":"0"

或者,如下所示..

>>> df.human_address.str.extract(r'("address":"\d{0,4}")')
                  0
0  "address":"1234"
1     "address":"0"
© www.soinside.com 2019 - 2024. All rights reserved.