剧作家:通过打印下载为 PDF?

问题描述 投票:0回答:2

我正在寻求使用 Playwright 抓取网页。

我加载页面,并成功单击 Playwright 的下载按钮。这将打开一个打印对话框,其中选择了打印机。

我想选择“另存为 PDF”,然后单击“保存”按钮。

这是我当前的代码:

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    playwright_page = browser.new_page()
    got_error = False

    try:
        playwright_page.goto(url_to_start_from)
        print(playwright_page.title())
        html = playwright_page.content()
    except Exception as e:
        print(f"Playwright exception: {e}")
        got_error = True

    if not got_error:
        soup = BeautifulSoup(html, 'html.parser')

        #download pdf
        with playwright_page.expect_download() as download_info:
            playwright_page.locator("text=download").click()

        download = download_info.value
        path = download.path()
        download.save_as(DOWNLOADED_PDF_FOLDER)

    browser.close()

有没有办法使用 Playwright 来做到这一点?

automation download playwright headless-browser playwright-python
2个回答
4
投票

您实际上并不需要打印对话框,您可以通过模拟媒体类型直接从 Playwright 生成该对话框。

await page.emulateMedia({ media: "print" });
await page.goto("https://robstarbuck.uk/cv");
await page.pdf({ path: "./cv.pdf", format: "A4" });

这就是我生成简历的方式。

另请参阅:


2
投票

非常感谢评论中的@KJ,他建议使用

headless=True
,Chromium 甚至不会首先设置打印对话框。

© www.soinside.com 2019 - 2024. All rights reserved.