如何将多个 HTML 文件转换为 PDF - pdfkit

问题描述 投票:0回答:1

在Python3和使用Ubuntu 22.04.3 LTS中,我需要将几个.HTML文件转换为.PDF文件 - 我只想将屏幕上显示的文本转换为PDF(而不是HTML代码)

我使用了 Python-PDFKit:HTML 到 PDF 包装器并首先安装了它:

pip install pdfkit

在终端:

sudo apt-get install wkhtmltopdf

这里有两个 .HTML 文件的示例

用于转换所有 .HTML 文件的 Python 脚本是这样的:

import os
import pdfkit

# Directory where the HTML files are located
os.chdir("/home/abraji/Documentos/Code/chat_multiple_pdfs/TRANSFERENCIA")

diretorio_html = '/home/abraji/Documentos/Code/chat_multiple_pdfs/TRANSFERENCIA'

# List all files in the folder
arquivos_na_pasta = os.listdir(diretorio_html)

# Filter only HTML files
arquivos_html = [arquivo for arquivo in arquivos_na_pasta if arquivo.endswith('.html')]

# Enconding 
options = {
    'encoding': "UTF-8"
}

# Iterate in each file and turn into PDF
for arquivo_html in arquivos_html:
    nome_pdf = arquivo_html.replace(".html", ".pdf")
    with open(arquivo_html) as f:
        pdfkit.from_file(f, nome_pdf, options=options)
    

但是我收到了这个错误:

Traceback (most recent call last):
  File "test_html_pdf.py", line 24, in <module>
    pdfkit.from_file(f, nome_pdf, options=options)
  File "/home/abraji/Documentos/Code/chat_multiple_pdfs/.venv/lib/python3.8/site-packages/pdfkit/api.py", line 51, in from_file
    return r.to_pdf(output_path)
  File "/home/abraji/Documentos/Code/chat_multiple_pdfs/.venv/lib/python3.8/site-packages/pdfkit/pdfkit.py", line 201, in to_pdf
    self.handle_error(exit_code, stderr)
  File "/home/abraji/Documentos/Code/chat_multiple_pdfs/.venv/lib/python3.8/site-packages/pdfkit/pdfkit.py", line 155, in handle_error
    raise IOError('wkhtmltopdf reported an error:\n' + stderr)
OSError: wkhtmltopdf reported an error:
Exit with code 1 due to network error: ProtocolUnknownError

请问有人知道如何解决吗?

html pdf wkhtmltopdf pdfkit
1个回答
0
投票

PDFkit 通常需要进行大量清理,以确保您的 HTML 代码不使用现代 CSS。您最好使用 WeasyPrint,如果您遇到问题,我绝对可以帮助您。

© www.soinside.com 2019 - 2024. All rights reserved.