使用BeautifulSoup进行网络清理时遇到的问题

问题描述 投票:0回答:1

我的问题是我无法捕获类listing_LinkedListingCard__5SRvZ我使用find_all,但结果是空列表

链接:https://sa.aqar.fm/%D9%81%D9%84%D9%84-%D9%84%D9%84%D8%A8%D9%8A%D8%B9/% D8%A8%D8%B1%D9%8A%D8%AF%D8%A9

我的代码

import requests
from bs4 import BeautifulSoup
import time

web_page = requests.get("https://sa.aqar.fm/%D9%81%D9%84%D9%84-%D9%84%D9%84%D8%A8%D9%8A%D8%B9/%D8%A8%D8%B1%D9%8A%D8%AF%D8%A9")

def main(page):
    src = page.content
    soup = BeautifulSoup(src, 'lxml')
    house_details = []

    houses= soup.find_all("div",{'class':'listing_LinkedListingCard__5SRvZ'})
    print(houses)
    


main(web_page)

有什么帮助吗?

我该如何解决它?

python web beautifulsoup screen-scraping
1个回答
0
投票

您的抓取被 cloudflare 阻止,请检查 soup.title。

当您使用请求 API 时,您正在发送一个用户代理字符串,告诉服务器您不是常规浏览器:

web_page.request.headers
>> {'User-Agent': 'python-requests/2.26.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive'}
© www.soinside.com 2019 - 2024. All rights reserved.