即使通过设置User-Agent,Scrapy也无法抓取数据,这是什么原因?

问题描述 投票:0回答:1

我正在学习Scrapy,我想抓this site

在我的蜘蛛中:

import scrapy

class TencentHrSpider(scrapy.Spider):
    name = 'tencent_hr'
    allowed_domains = ['careers.tencent.com']
    start_urls = ['http://careers.tencent.com/search.html']

    def parse(self, response):

        div_list = response.xpath('//div[@class="recruit-list"]')

        print(div_list)  # there get `[]`, no data in it.

当我开始抓取时,没有数据输出。为什么?

我已经在settings.py中设置了请求标头User-Agent:

USER_AGENT_LIST=[
'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36'
    "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 360SE)",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
    "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
    "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24",
    "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24"
]
import random
USER_AGENT = random.choice(USER_AGENT_LIST)

编辑-01

是否可以找到原因?任何错误日志要跟踪?


EDIT -02

为什么AJAX从API请求数据,Scrapy无法获取数据?我们知道它可以下载整个页面,是否可以像浏览器一样运行脚本?

python scrapy
1个回答
0
投票

该网站使用Javascript,因此将使抓取更加困难。该网站说明了如何处理。请让我知道是否对您有帮助。

https://www.accordbox.com/blog/scrapy-tutorial-11-how-to-extract-data-from-native-javascript-statement/

© www.soinside.com 2019 - 2024. All rights reserved.