爬取Python时无法访问的网站如何爬取?

问题描述 投票:0回答:1
import requests

url = "https://cafe.bithumb.com/view/boards/43?keyword=&noticeCategory=9"

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36',
    "Referer": "https://cafe.bithumb.com/",
}

try:
    response = requests.get(url, headers=headers, timeout=10)
    response.raise_for_status()

    print(response.text)
except requests.exceptions.RequestException as err:
    print(f"error: {err}")

如果用Chrome连接的话可以连接好,但是用Python就无法连接。 我该怎么办?

python web-crawler
1个回答
0
投票

您因 cloudflare 而陷入停滞。虽然我无法使用请求找到解决方法,但我可以建议您使用硒来实现您的目标:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

url = "https://cafe.bithumb.com/view/boards/43?keyword=&noticeCategory=9"

chrome_options = Options()
chrome_options.add_argument('user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36')
chrome_options.add_argument('--headless')  # Run Chrome in headless mode

driver = webdriver.Chrome(options=chrome_options)

try:
    driver.get(url)
    page_source = driver.page_source
    print(page_source)

except Exception as e:
    print(f"error: {e}")

finally:
    driver.quit()

© www.soinside.com 2019 - 2024. All rights reserved.