网页抓取 Zoopla 网站时出现错误代码 403

问题描述 投票:0回答:1

我试图在 Zoopla.co.uk 网站上抓取邮政编码,但我不断收到 HTTP 错误 403(它认为我是机器人吗)?

这是我的代码:

import requests 
from bs4 import BeautifulSoup as soup



head = {
  "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36"
}

url  = "https://www.zoopla.co.uk/to-rent/property/west-midlands/handsworth/sandwell-road/b21-8nl/?q=B21%208NL&radius=1"

data = requests.get(url, head)
data.status_code
python http web-scraping http-status-code-403
1个回答
0
投票

该网站受 cloudflare 保护,正如您所说,是的,它会阻止您的请求。也许通过硒和一些方法你可以达到你的斑羚,就像这样:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = webdriver.ChromeOptions()
options.add_argument("--disable-blink-features=AutomationControlled") 
options.add_experimental_option("excludeSwitches", ["enable-automation"]) 
options.add_experimental_option("useAutomationExtension", False) 
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})") 
driver = webdriver.Chrome(options=options) 

如果您发送很多请求,请不要忘记使用 time.sleep(),它可以帮助您不被检测为机器人。

© www.soinside.com 2019 - 2024. All rights reserved.