Python、Selenium:将整页屏幕截图保存为 .pdf,无需分页,无论页面尺寸如何

问题描述 投票:0回答:2

目前,我发现可以使用 Selenium 创建屏幕截图。但是,它们始终是

.png
文件。如何截取与
.pdf
相同风格的屏幕截图?

要求样式:无边距;与当前页面相同的尺寸(如整页屏幕截图)
由于打印附带的所有格式,打印页面无法实现此目的。

我目前如何获取屏幕截图:

from selenium import webdriver

# Function to find page size
S = lambda X: driver.execute_script('return document.body.parentNode.scroll'+X)

driver = webdriver.Firefox(options=options)
driver.get('https://www.google.com')

# Screen 
height = S('Height')
width = S('Width')

driver.set_window_size(width, height)
driver.get_screenshot_as_file(PNG_SAVEAS)

driver.close()
python selenium-webdriver geckodriver
2个回答
2
投票

为了达到预期的结果,我找到了一个在其他地方不容易获得的解决方案。

关键是动态配置PDF页面的宽度和高度以匹配正在打印的内容。此外,我发现将结果缩小到原始大小的 1% 可以显着加快该过程。

需要注意的一点是,在使用 GeckoDriver 时,我遇到了一个错误(reference),导致生成的 PDF 打印尺寸错误。但是,我发现将大小乘以

2.5352112676056335
可以解决问题。我仍然不清楚为什么这个特定常数与我的答案相关,但如果不应用此修复,PDF 的纵横比就会扭曲(而不是按比例缩小到所需大小的 39%)。扭曲会产生多页 .pdf 文件,这不是预期的结果。

此方法已使用 GeckoDriver 进行测试。如果您使用的是 Chrome,则可能不需要

RATIO_MULTIPLIER
解决方法。

from selenium import webdriver
from selenium.webdriver.common.print_page_options import PrintOptions
import base64

# Bug in geckodriver... seems unrelated, but this wont work otherwise.
# https://github.com/SeleniumHQ/selenium/issues/12066
RATIO_MULTIPLIER = 2.5352112676056335

# Function to find page size
S = lambda X: driver.execute_script('return document.body.parentNode.scroll'+X)

# Scale for PDF size. 1 for no change takes long time
pdf_scaler = .01

# Browser options. Headless is more reliable for screenshots in my exp.
options = webdriver.FirefoxOptions()
options.add_argument('--headless')

# Lanuch webdriver, navigate to destination
driver = webdriver.Firefox(options=options)
driver.get('https://www.google.com')

# Find full page dimensions regardless of scroll
height = S('Height')
weight = S('Width')

# Dynamic setting of PDF page dimensions
print_options = PrintOptions()
print_options.page_height = (height*pdf_scaler)*RATIO_MULTIPLIER
print_options.page_width = (weight*pdf_scaler)*RATIO_MULTIPLIER
print_options.shrink_to_fit = True

# Prints to PDF (returns base64 encoded data. Must save)
pdf = driver.print_page(print_options=print_options)
driver.close()

# save the output to a file.
with open('example.pdf', 'wb') as file:
    file.write(base64.b64decode(pdf))

使用的版本:

geckodriver 0.31.0
Firefox 113.0.1
selenium==4.9.1
Python 3.11.2
Windows 10  

编辑:这是因为这里的单位是厘米,而不是英寸。 2.5352112676056335 是换算英寸->厘米:)


0
投票

试试这个:

from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from webdriver_manager.firefox import GeckoDriverManager
from PIL import Image

def get_page_size(driver):
    return driver.execute_script('return [document.documentElement.clientWidth, document.documentElement.clientHeight];')

def scroll_to_bottom(driver):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

def capture_screenshot_as_pdf(driver, file_path):
    driver.save_screenshot(file_path)

def convert_to_pdf(input_file, output_file):
    image = Image.open(input_file)
    image.save(output_file, 'PDF', resolution=100.0)

# Set up the Firefox driver with options
options = Options()
options.headless = True
capabilities = DesiredCapabilities.FIREFOX.copy()
capabilities['acceptInsecureCerts'] = True
driver = webdriver.Firefox(options=options, executable_path=GeckoDriverManager().install(), capabilities=capabilities)

# Navigate to the webpage
driver.get('https://www.google.com')

# Get the page size
page_size = get_page_size(driver)

# Set the window size
driver.set_window_size(page_size[0], page_size[1])

# Scroll to the bottom to load dynamic content
scroll_to_bottom(driver)

# Capture the full-page screenshot as PNG
png_file_path = 'full_page_screenshot.png'
capture_screenshot_as_pdf(driver, png_file_path)

# Convert the PNG screenshot to PDF
pdf_file_path = 'full_page_screenshot.pdf'
convert_to_pdf(png_file_path, pdf_file_path)

# Clean up and close the browser
driver.quit()

此代码将整页屏幕截图捕获为 PNG 文件,然后将其转换为 PDF 文件。将文件路径(png_file_path 和 pdf_file_path)调整到您想要保存文件的位置。

© www.soinside.com 2019 - 2024. All rights reserved.