如何在云上部署硒驱动的蜘蛛?

问题描述 投票:0回答:1

我使用scrapyd在本地机器上部署和调度我的蜘蛛。我现在面临的挑战是部署我的蜘蛛,用无头浏览器执行。

我在scrapyd的日志文件中得到了两个错误,这些错误都与在项目目录中找不到webdriver有关。

FileNotFoundError: [Errno 2] No such file or directory: './chromedriver'

selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. 
  • 无头浏览器可以在云端运行吗?
  • 在部署项目的时候,会不会掉了chromedriver?
  • 有没有办法在scrapyd中查看我的项目文件,以确定该文件是否还存在于项目目录中?

下面是我的代码副本

# I'm using SeleniumRequest for my requests so this is the configuration is my settings file 


chrome_path='./chromedriver'
SELENIUM_DRIVER_NAME = 'chrome' # Change to your browser name
SELENIUM_DRIVER_EXECUTABLE_PATH = chrome_path
SELENIUM_DRIVER_ARGUMENTS=['--headless']  # '--headless' if using chrome instead of firefox

FEED_EXPORT_ENCODING='utf-8'

这是我的蜘蛛代码

import scrapy
from scrapy_selenium import SeleniumRequest
from scrapy.selector import Selector
import time


class CovidngSpider(scrapy.Spider):
    name = 'covidng'
    #allowed_domains = ['covid19.ncdc.gov.ng']
    #start_urls = ['https://covid19.ncdc.gov.ng/']

def start_requests(self):
    yield SeleniumRequest(url ='https://covid19.ncdc.gov.ng/', wait_time = 3, screenshot = True, callback = self.parse)

def parse(self, response):



    driver = response.meta['driver']
    page_html = driver.page_source
    new_resp = Selector(text=page_html)

    databox = new_resp.xpath("//table[@id='custom3']/tbody/tr")

    for rows in databox:
        state = rows.xpath(".//td[1]/p/text()").get()
        total_cases = rows.xpath(".//td[2]/p/text()").get()
        active_cases = rows.xpath(".//td[3]/p/text()").get()
        discharged = rows.xpath(".//td[4]/p/text()").get()
        death = rows.xpath(".//td[5]/p/text()").get()

        yield {
            'State': state,
            'Total Cases': total_cases,
            'Active Cases': active_cases,
            'Discharged' : discharged,
            'Death': death
        }
python selenium web-scraping headless-browser scrapyd
1个回答
0
投票

第一:检查你是否已经安装了 chromedriver 因为它不属于 Selenium 而你总是要单独安装它。geckodriver 如果你使用 Firefox)

第二:使用 /full/path/to/chromedriver - 系统可能会在不同的文件夹中运行代码,而不是你所期望的,然后相对路径 ./chromedriver 可能会直接到你期望的地方。

© www.soinside.com 2019 - 2024. All rights reserved.