Beautiful Soup Links-href不返回结果

问题描述 投票:0回答:1

我正在尝试仅检索以下页面上公司的链接:https://clutch.co/it-services/msp

[这似乎是一个常见问题,我花了一整天时间审查其他帖子,但没有获得任何成功。

代码:

links = []
for l in soup.find_all(class_='website-link website-link-a'):
    results = (l.get('href'))
    links.append(results)

print(links)

输出:

[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]

仅打印soup.find_all的结果时,会得到:

<a data-extlink-pid="1219089" href="https://fulcrumdigital.com/" rel="nofollow" target="_blank">
<i class="icon icon-visit-site"></i><span class="">Visit Website</span>
</a>
</li>, etc, etc,

我需要在href后面提取内容,但无法弄清楚如何提取。任何建议都将不胜感激。

python web-scraping beautifulsoup hyperlink tags
1个回答
0
投票

您可以使用CSS选择器'.website-link-a > a'(使用<a>在标签下直接选择每个class="website-link-a"标签:]

import requests
from bs4 import BeautifulSoup

url = 'https://clutch.co/it-services/msp'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for a in soup.select('.website-link-a > a'):
    print(a['href'])

打印:

http://electric.ai/
http://www.symphony-solutions.com/?utm_source=clutch.co&utm_medium=referral&utm_campaign=it-services-msp
https://www.bairesdev.com/?utm_source=clutch.co&utm_medium=referral&utm_campaign=msp
https://www.helixstorm.com/?utm_source=clutch.co&utm_medium=referral&utm_campaign=it-services-msp
http://www.sundevs.com/?utm_source=clutch.co&utm_medium=referral&utm_campaign=it-services-msp
http://www.computersolutionseast.com/?utm_source=clutch.co&utm_medium=referral&utm_campaign=it-services-msp
/your-project
http://techmd.com
http://www.sugarshot.io/?utm_source=clutch.co&utm_medium=referral&utm_campaign=directory
https://www.empist.com?utm_source=clutch.co&utm_medium=referral
http://www.frameworkIT.com/?utm_source=clutch.co&utm_medium=referral
https://www.clickittech.com/
https://cyberduo.com/?utm_source=clutch.co&utm_medium=referral&utm_campaign=it-services-msp
http://www.realnets.com/?utm_source=clutch.co&utm_medium=referral
https://www.ibexlabs.com/?utm_source=clutch.co&utm_medium=referral
https://bianor.com/
http://www.endpoint.com/?utm_source=clutch.co&utm_medium=referral
https://devopsprodigy.com/?utm_source=clutch.co&utm_medium=referral&utm_campaign=directory
https://vrpconsulting.com/
https://siliconreef.co.uk/?utm_source=clutch.co&utm_medium=referral
http://www.agencypartner.com?utm_source=clutch&utm_medium=profile&utm_campaign=directory_listing
© www.soinside.com 2019 - 2024. All rights reserved.