在Python3和使用Ubuntu 22.04.3 LTS中,我需要将几个.HTML文件转换为.PDF文件 - 我只想将屏幕上显示的文本转换为PDF(而不是HTML代码)
我使用了 Python-PDFKit:HTML 到 PDF 包装器并首先安装了它:
pip install pdfkit
在终端:
sudo apt-get install wkhtmltopdf
这里有两个 .HTML 文件的示例
用于转换所有 .HTML 文件的 Python 脚本是这样的:
import os
import pdfkit
# Directory where the HTML files are located
os.chdir("/home/abraji/Documentos/Code/chat_multiple_pdfs/TRANSFERENCIA")
diretorio_html = '/home/abraji/Documentos/Code/chat_multiple_pdfs/TRANSFERENCIA'
# List all files in the folder
arquivos_na_pasta = os.listdir(diretorio_html)
# Filter only HTML files
arquivos_html = [arquivo for arquivo in arquivos_na_pasta if arquivo.endswith('.html')]
# Enconding
options = {
'encoding': "UTF-8"
}
# Iterate in each file and turn into PDF
for arquivo_html in arquivos_html:
nome_pdf = arquivo_html.replace(".html", ".pdf")
with open(arquivo_html) as f:
pdfkit.from_file(f, nome_pdf, options=options)
但是我收到了这个错误:
Traceback (most recent call last):
File "test_html_pdf.py", line 24, in <module>
pdfkit.from_file(f, nome_pdf, options=options)
File "/home/abraji/Documentos/Code/chat_multiple_pdfs/.venv/lib/python3.8/site-packages/pdfkit/api.py", line 51, in from_file
return r.to_pdf(output_path)
File "/home/abraji/Documentos/Code/chat_multiple_pdfs/.venv/lib/python3.8/site-packages/pdfkit/pdfkit.py", line 201, in to_pdf
self.handle_error(exit_code, stderr)
File "/home/abraji/Documentos/Code/chat_multiple_pdfs/.venv/lib/python3.8/site-packages/pdfkit/pdfkit.py", line 155, in handle_error
raise IOError('wkhtmltopdf reported an error:\n' + stderr)
OSError: wkhtmltopdf reported an error:
Exit with code 1 due to network error: ProtocolUnknownError
请问有人知道如何解决吗?
PDFkit 通常需要进行大量清理,以确保您的 HTML 代码不使用现代 CSS。您最好使用 WeasyPrint,如果您遇到问题,我绝对可以帮助您。