无法使用请求或 Selenium 获取 href 链接

Question

我的目标是从此页面提取所有 href 链接并找到 .pdf 链接。我尝试使用 requests 库和 Selenium，但它们都无法提取它。

如何解决这个问题？谢谢。

例如：这包含 .pdf 文件链接

这是请求代码：

    import requests
    from bs4 import BeautifulSoup

    headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/113.0'}
    url="https://www.bain.com/insights/topics/energy-and-natural-resources-report/"
    
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')

    for link in soup.find_all('a'):
        print(link.get('href'))

这是硒代码：

    from selenium import webdriver
    from selenium.webdriver.chrome.service import Service as ChromeService
    from webdriver_manager.chrome import ChromeDriverManager
    from bs4 import BeautifulSoup

    options = webdriver.ChromeOptions()
    driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()), options=options)

    driver.get("https://www.bain.com/insights/topics/energy-and-natural-resource-report/")
    driver.implicitly_wait(10)

    soup = BeautifulSoup(self.page_source, 'html.parser')
    for link in soup.find_all('a'):
        print(link.get('href'))

    driver.quit()

Answer 1

链接的 HTML 是

您可以使用下面的 CSS 选择器来找到下载链接，

a[href*='pdf']

此选择器只是查找 href 属性中包含字符串“pdf”的 A 标记。

我不知道这是否会像影响 Selenium 一样影响 BeautifulSoup，但链接位于 IFRAME 中。 IFRAME 的定位器是

iframe[title='v1_ENR Report 2023_ToC']

无法使用请求或 Selenium 获取 href 链接

问题描述投票：0回答：2

2个回答

最新问题

无法使用请求或 Selenium 获取 href 链接

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2