Scrapy Playwright页面方法：防止找不到选择器时出现超时错误

Question

我的问题与Scrapy Playwright有关，以及如果无法找到特定选择器，如何防止Spider页面崩溃。

下面是一个使用 Playwright 与网站交互的 Scrapy Spider。蜘蛛等待 cookie 按钮出现，然后单击它。选择器以及操作在 Request 对象的元属性中定义，并且在名为 page_methods 的列表中的字典中定义。如果 GDPR 按钮不存在，页面会崩溃并出现超时错误：

playwright._impl._errors.TimeoutError: Timeout 30000ms exceeded.

from typing import Iterable
import scrapy
from scrapy_playwright.page import PageMethod

GDPR_BUTTON_SELECTOR = "iframe[id^='sp_message_iframe'] >> internal:control=enter-frame >> .sp_choice_type_11"


class GuardianSpider(scrapy.Spider):
    name = "guardian"
    allowed_domains = ["www.theguardian.com"]
    start_urls = ["https://www.theguardian.com"]

    def start_requests(self) -> Iterable[scrapy.Request]:
        url = "https://www.theguardian.com"
        yield scrapy.Request(
            url,
            meta=dict(
                playwright=True,
                playwright_include_page=True,
                playwright_page_methods=[
                    PageMethod("wait_for_selector", GDPR_BUTTON_SELECTOR),
                    PageMethod("dispatch_event", GDPR_BUTTON_SELECTOR, "click"),
                ],
            ),
        )

    def parse(self, response):
        pass

如果您运行蜘蛛，并且出现 Cookie 按钮，则一切正常。但是，如果 Cookie 按钮不存在，蜘蛛会因超时错误而崩溃。

这不是我想要处理 GDPR 按钮的方式。我想要一个功能来检查按钮是否存在然后单击它。下面是一个简单的 Python-playwright 函数，它正是这样做的。该函数接受 Page 对象并检查 GDPR 按钮是否存在。如果是，则单击它。如果不是，则什么也不做。

from playwright.sync_api import Page

def accecpt_gdpr(page: Page) -> None:
    if page.locator(GDPR_BUTTON_SELECTOR).count():
        page.locator(GDPR_BUTTON_SELECTOR).dispatch_event("click")

如何在 Scrapy Spider 中实现相同的功能？

Answer 1

试试这个：

doesGdprButtonExist = page.query_selector(GDPR_BUTTON_SELECTOR)

if doesGdprButtonExist:
  page.locator(GDPR_BUTTON_SELECTOR).dispatch_event("click")
else
  dosomethingelse..

Scrapy Playwright页面方法：防止找不到选择器时出现超时错误

问题描述投票：0回答：1

1个回答

最新问题

Scrapy Playwright页面方法：防止找不到选择器时出现超时错误

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1