我的问题是我无法捕获类listing_LinkedListingCard__5SRvZ我使用find_all,但结果是空列表
链接:https://sa.aqar.fm/%D9%81%D9%84%D9%84-%D9%84%D9%84%D8%A8%D9%8A%D8%B9/% D8%A8%D8%B1%D9%8A%D8%AF%D8%A9
我的代码
import requests
from bs4 import BeautifulSoup
import time
web_page = requests.get("https://sa.aqar.fm/%D9%81%D9%84%D9%84-%D9%84%D9%84%D8%A8%D9%8A%D8%B9/%D8%A8%D8%B1%D9%8A%D8%AF%D8%A9")
def main(page):
src = page.content
soup = BeautifulSoup(src, 'lxml')
house_details = []
houses= soup.find_all("div",{'class':'listing_LinkedListingCard__5SRvZ'})
print(houses)
main(web_page)
有什么帮助吗?
我该如何解决它?
您的抓取被 cloudflare 阻止,请检查 soup.title。
当您使用请求 API 时,您正在发送一个用户代理字符串,告诉服务器您不是常规浏览器:
web_page.request.headers
>> {'User-Agent': 'python-requests/2.26.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive'}