如何从字符串中提取Url数据

问题描述 投票:0回答:3

我有以下字符串,其中包含许多Url值。如何在此字符串中的DataUrl术语后提取Url?所以我得到了一个Urls列表,例如:americanexpress.com,vice.com,chegg.com

{'DataUrl':'americanexpress.com','Country':{'Rank':'96','Reach':{'PerMillion':'7350'},'PageViews':{'PerMillion':'600.2' ,'PerUser':'3.6'}},'Global':{'Rank':'362'}},{'DataUrl':'vice.com','Country':{'Rank':'97', 'Reach':{'PerMillion':'15703.61'},'PageViews':{'PerMillion':'489.97','PerUser':'1.38'}},'Global':{'Rank':'208'} },{'DataUrl':'chegg.com','Country':{'Rank':'98','Reach':{'PerMillion':'6280'},'PageViews':{'PerMillion':' 882.3','PerUser':'6.2'}},'Global':{'Rank':'402'}},{'DataUrl':'mlb.com','Country':{'Rank':'99 ','Reach':{'PerMillion':'7280'},'PageViews':{'PerMillion':'564.1','PerUser':'3.42'}},'Global':{'Rank':'427 '}},{'DataUrl':'xnxx.com','Country':{'Rank':'100','Reach':{'PerMillion':'5560'},'PageViews':{'PerMillion' :'1271','PerUser':'10 .1'}},'全球':{'Rank':'95'}

我尝试了各种FindAll表达式。

python string extract
3个回答
1
投票

Python有一个名为json的内置包,可用于处理JSON数据。

您可以将python对象转换为json对象,然后轻松获取DataUrl。

请参考https://www.w3schools.com/python/python_json.asp


1
投票

它看起来像qazxsw poi数据的一部分,所以如果你有完整的qazxsw poi数据,那么你可以使用模块JSON加载它并在字典中搜索JSON

如果你有不完整的JSON数据,那么你可以使用json

DataUrl

结果

regex

您也可以尝试使用text = '''{'DataUrl': 'americanexpress.com', 'Country': {'Rank': '96', 'Reach': {'PerMillion': '7350'}, 'PageViews': {'PerMillion': '600.2', 'PerUser': '3.6'}}, 'Global': {'Rank': '362'}}, {'DataUrl': 'vice.com', 'Country': {'Rank': '97', 'Reach': {'PerMillion': '15703.61'}, 'PageViews': {'PerMillion': '489.97', 'PerUser': '1.38'}}, 'Global': {'Rank': '208'}}, {'DataUrl': 'chegg.com', 'Country': {'Rank': '98', 'Reach': {'PerMillion': '6280'}, 'PageViews': {'PerMillion': '882.3', 'PerUser': '6.2'}}, 'Global': {'Rank': '402'}}, {'DataUrl': 'mlb.com', 'Country': {'Rank': '99', 'Reach': {'PerMillion': '7280'}, 'PageViews': {'PerMillion': '564.1', 'PerUser': '3.42'}}, 'Global': {'Rank': '427'}}, {'DataUrl': 'xnxx.com', 'Country': {'Rank': '100', 'Reach': {'PerMillion': '5560'}, 'PageViews': {'PerMillion': '1271', 'PerUser': '10.1'}}, 'Global': {'Rank': '95'}''' import re urls = re.findall("'DataUrl': '([^']*)'", text) print(urls) ['americanexpress.com', 'vice.com', 'chegg.com', 'mlb.com', 'xnxx.com']

.split("{'DataUrl': '")

结果

split("',")

如果你有完整和正确格式化的JSON - 使用text = '''{'DataUrl': 'americanexpress.com', 'Country': {'Rank': '96', 'Reach': {'PerMillion': '7350'}, 'PageViews': {'PerMillion': '600.2', 'PerUser': '3.6'}}, 'Global': {'Rank': '362'}}, {'DataUrl': 'vice.com', 'Country': {'Rank': '97', 'Reach': {'PerMillion': '15703.61'}, 'PageViews': {'PerMillion': '489.97', 'PerUser': '1.38'}}, 'Global': {'Rank': '208'}}, {'DataUrl': 'chegg.com', 'Country': {'Rank': '98', 'Reach': {'PerMillion': '6280'}, 'PageViews': {'PerMillion': '882.3', 'PerUser': '6.2'}}, 'Global': {'Rank': '402'}}, {'DataUrl': 'mlb.com', 'Country': {'Rank': '99', 'Reach': {'PerMillion': '7280'}, 'PageViews': {'PerMillion': '564.1', 'PerUser': '3.42'}}, 'Global': {'Rank': '427'}}, {'DataUrl': 'xnxx.com', 'Country': {'Rank': '100', 'Reach': {'PerMillion': '5560'}, 'PageViews': {'PerMillion': '1271', 'PerUser': '10.1'}}, 'Global': {'Rank': '95'}''' urls = text.split("{'DataUrl': '") urls = [item.split("',")[0] for item in urls if item] print(urls) 而不是['americanexpress.com', 'vice.com', 'chegg.com', 'mlb.com', 'xnxx.com'] - 那么你可以使用模块"

在这里,我使用完整的JSON

'

结果

json
© www.soinside.com 2019 - 2024. All rights reserved.