Scrapy-selenium 错误:TypeError:WebDriver.__init__() 得到意外的关键字参数“executable_path”

问题描述 投票:0回答:2

我正在尝试设置 scrapy-selenium 来进行一些抓取: pip 安装了 scrappy、scrapy-selenium;下载并放入我的项目目录chromedriver.exe,更新setting.py:

from shutil import which
  
SELENIUM_DRIVER_NAME = 'chrome'
SELENIUM_DRIVER_EXECUTABLE_PATH = which('chromedriver')
SELENIUM_DRIVER_ARGUMENTS=['--headless']  
  
DOWNLOADER_MIDDLEWARES = {
     'scrapy_selenium.SeleniumMiddleware': 800
     }

还尝试使用 Chromedriver 位置的完整路径而不仅仅是哪个函数,但我收到此错误,我不知道为什么:

2023-06-20 10:48:59 [扭曲] 严重:延迟中未处理的错误:

Traceback (most recent call last):
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\crawler.py", line 240, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\crawler.py", line 244, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\twisted\internet\defer.py", line 1947, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\twisted\internet\defer.py", line 1857, in _cancellableInlineCallbacks
    _inlineCallbacks(None, gen, status, _copy_context())
--- <exception caught here> ---
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\twisted\internet\defer.py", line 1697, in _inlineCallbacks
    result = context.run(gen.send, result)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\crawler.py", line 129, in crawl
    self.engine = self._create_engine()
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\crawler.py", line 143, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\core\engine.py", line 100, in __init__
    self.downloader: Downloader = downloader_cls(crawler)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\core\downloader\__init__.py", line 97, in __init__
    DownloaderMiddlewareManager.from_crawler(crawler)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\middleware.py", line 68, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\middleware.py", line 44, in from_settings
    mw = create_instance(mwcls, settings, crawler)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\utils\misc.py", line 170, in create_instance
    instance = objcls.from_crawler(crawler, *args, **kwargs)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy_selenium\middlewares.py", line 67, in from_crawler
    middleware = cls(
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy_selenium\middlewares.py", line 51, in __init__
    self.driver = driver_klass(**driver_kwargs)
builtins.TypeError: WebDriver.__init__() got an unexpected keyword argument 'executable_path'

2023-06-20 10:48:59 [twisted] CRITICAL: 
Traceback (most recent call last):
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\twisted\internet\defer.py", line 1697, in _inlineCallbacks
    result = context.run(gen.send, result)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\crawler.py", line 129, in crawl
    self.engine = self._create_engine()
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\crawler.py", line 143, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\core\engine.py", line 100, in __init__
    self.downloader: Downloader = downloader_cls(crawler)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\core\downloader\__init__.py", line 97, in __init__
    DownloaderMiddlewareManager.from_crawler(crawler)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\middleware.py", line 68, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\middleware.py", line 44, in from_settings
    mw = create_instance(mwcls, settings, crawler)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\utils\misc.py", line 170, in create_instance
    instance = objcls.from_crawler(crawler, *args, **kwargs)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy_selenium\middlewares.py", line 67, in from_crawler
    middleware = cls(
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy_selenium\middlewares.py", line 51, in __init__
    self.driver = driver_klass(**driver_kwargs)
TypeError: WebDriver.__init__() got an unexpected keyword argument 'executable_path'

任何人都可以帮忙解决这个问题吗?

python selenium-webdriver scrapy scrapy-selenium
2个回答
1
投票

我在这篇 github 帖子中帮助解决了这个问题:https://github.com/clemfromspace/scrapy-selenium/issues/128。请注意,我使用 scrapy 创建网络抓取工具,并使用 Selenium 与网站交互。

  • 转到ton77v的提交5c3fe7b并将其代码复制到middlewares.py
  • 替换本地计算机上 scrapy_selenium 包下的 middlewares.py 代码(对我来说,它位于 C:/Users//AppData/Local/anaconda3/Lib/site-packages/scrapy_selenium/middlewares.py 中)
  • [可选]:我还必须 !pip 安装 webdriver-manager 对于您的 scrapy 蜘蛛,您需要修改 settings.py 文件(这是启动 scrapy 项目时出现的配置文件的一部分,如 items.py、middlewares.py、pipelines.py 和 settings.py)。将以下代码行添加到settings.py文件中
    • SELENIUM_DRIVER_NAME = 'chrome'
    • SELENIUM_DRIVER_EXECUTABLE_PATH = None #not actually necessary, will work even if you comment this line out
    • SELENIUM_DRIVER_ARGUMENTS=[] #put '--headless' in the brackets to prevent browser popup
  • 然后在终端中输入
    scrapy runspider <scraper_name>.py
    并享受!

快速解释正在发生的事情:

  • 您将使用 scrapy 安装 BrowserDriverManager,并且不必再指定 BrowserDriverManager 位置
  • 美妙之处在于,在第一次安装 BrowserDriverManager 后,它会记住安装位置并使用安装的 BrowserDriverManager 进行后续运行
  • 您可以通过修改 middlewares.py 文件(让 ChatGPT 为您做这件事 XD)并更改 SELENIUM_DRIVER_NAME =(浏览器名称)来调整抓取工具以打开其他浏览器

0
投票

Selenium 在生成 Web 驱动程序对象时从使用executable_path 更改为服务对象。这些更改不包含在当前版本的 Scrapy-selenium 包中。要解决这个问题,我建议:

  1. 在 GitHub 上分叉该项目:https://github.com/clemfromspace/scrapy-selenium/fork

  2. scrapy_selenium/middlewares.py
    中创建一个服务对象,并在创建 web_driver 对象时传递它而不是executable_path(类似于此 PR 中的更改:https://github.com/clemfromspace/scrapy-selenium/pull/135/文件)。

    if driver_executable_path is not None:
        service_module = import_module(f'{webdriver_base_path}.service')
        service_klass = getattr(service_module, 'Service')
        service_kwargs = {
            'executable_path': driver_executable_path,
        }
        service = service_klass(**service_kwargs)
        driver_kwargs = {
           'service': service,
           'options': driver_options
        }
        self.driver = driver_klass(**driver_kwargs)
    
  3. 使用

    python -m unittest discover -p "test_*.py"
    运行单元测试以确认一切仍然按预期工作。

  4. 提交并推动您的更改

  5. pip卸载scrapy-selenium

  6. pip install git+{https://your_repository}

注意:在项目中设置包时,您可以使用

settings.py
中的相同配置

© www.soinside.com 2019 - 2024. All rights reserved.