如何防止python编剧截图多时超时

问题描述 投票:0回答:1

我正在尝试使用 playwright 对 reddit 线程进行多张截图。它的工作原理是从线程的第一个元素的第一个 xpath 开始,然后递增该 xpath 直到它到达一个不存在的 xpath。它适用于前 27 个屏幕截图,但之后它会超时。


import pickle
from playwright.sync_api import sync_playwright,Page
with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    context =browser.new_context()
    page = context.new_page()
    cookies = pickle.load(open("cookies.pkl", "rb"))
    context.add_cookies(cookies)
    #this is the link it goes to to begin taking screnshots
    page.goto('https://www.reddit.com/r/AskReddit/comments/11iwm33/who_is_a_bad_guy_in_history_who_actually_wasnt_a/')
    #reloads just to make sure there are no popups
    i=2
    page.reload()
    page.set_viewport_size({"width": 640, "height": 480})
    page.reload()
    page.reload()
    #this loop keeps going till a invalid xpath is found and screenshots it.
    while True:
        try:
            page.locator("xpath=/html/body/div[1]/div/div[2]/div[2]/div/div/div/div[2]/div[3]/div[1]/div[3]/div[6]/div/div/div/div[%s]"%i).screenshot(path='screenshots/'+(str(i)+'.png'))
            i+=1
        except Exception as e:
            print(e)
            break

我没有尝试太多,因为似乎几乎没有解决问题的答案,尽管我认为发生这种情况的原因是页面上的 javascript 执行太多导致超时。 此外,随着屏幕截图的制作,它们对打破屏幕截图循环的异常变得越来越慢。我也尝试用 Selenium 编写完全相同的程序并得到相同的结果,我想通过更改库我可能会得到不同的结果但实际上我遇到了我试图运行的完全相同的错误。

这是错误

Timeout 29656ms exceeded.
=========================== logs ===========================
taking element screenshot
  waiting for element to be visible and stable
    element is not visible - waiting...
============================================================

编辑这个问题可能是因为它试图截取一个巨大的屏幕截图

selenium-webdriver web-scraping chromium playwright-python
1个回答
0
投票

一个空白的 div 导致了这个错误的发生

© www.soinside.com 2019 - 2024. All rights reserved.