使用 reportlab 和 PyPDF2 合并 PDF 会丢失图像和嵌入字体

Question

我正在尝试获取存储在 AWS 上的现有 PDF，将其读入我的后端（Django 1.1、Python 2.7）并在边距中添加文本。我当前的代码成功地接受了 PDF 并将文本添加到边距，但它破坏了 PDF：

在浏览器中打开时：

删除图片
偶尔在单词之间添加字符
偶尔完全改变PDF的字符集

在 Adobe 中打开时：

说“无法提取嵌入字体'无论字体名称'。一些许多字符无法正确显示或打印”
说“发生绘图错误”
如果有图片预编辑，提示“Insufficient data for an 图片"

我制作了自己的 PDF，有/没有预定义字体和有/没有图像。具有预定义字体且没有图像的那些按预期工作，但对于图像它会抛出“读取流时出错。”在 Adobe 中打开时，只是不在浏览器中显示图像。我得出的结论是缺少字体是字符问题的原因，但我不确定为什么图像没有显示。

我无法控制我正在编辑的 PDF 的内容，所以我不能确保它们只使用预定义的字体，而且它们肯定需要在其中包含图像。下面是我的代码

from reportlab.pdfgen import canvas

from PyPDF2 import PdfFileWriter, PdfFileReader
from StringIO import StringIO

class DownloadMIR(APIView):
    permission_classes = (permissions.IsAuthenticated,)

    def post(self, request, format=None):
        data = request.data

        file_path = "some_path"
        temp_file_path = "some_other_path"

        # read your existing PDF

        if default_storage.exists(file_path):
            existing_pdf = PdfFileReader(default_storage.open(file_path, 'rb'))
        else:
            raise Http404("could not find pdf")

        packet = StringIO()
        # create a new PDF with Reportlab
        can = canvas.Canvas(packet)
        height, width = int(existing_pdf.getPage(0).mediaBox.getUpperRight_x()), int(
            existing_pdf.getPage(0).mediaBox.getUpperRight_y())
        print("width:" + str(width) + " height: " + str(height))
        can.setPageSize([width, height])
        can.rotate(90)
        footer = "Prepared for " + request.user.first_name + " " + request.user.last_name + " on " + datetime.now().strftime('%Y-%m-%d at %H:%M:%S')
        can.setFont("Courier", 8)
        can.drawCentredString(width / 2, -15, footer)
        can.save()

        packet.seek(0)
        new_pdf = PdfFileReader(packet)

        output = PdfFileWriter()
        for index in range(existing_pdf.numPages):
            page = existing_pdf.getPage(index)
            page.mergePage(new_pdf.getPage(0))
            output.addPage(page)
            #print("done page " + str(index))

        response = HttpResponse(content_type="application/pdf")

        response['Content-Disposition'] = 'attachment; filename=' + temp_file_path

        output.write(response)
        return response

使用我在网上找到的脚本，我看到有未嵌入的字体。

Font List
['/MPDFAA+DejaVuSansCondensed', '/MPDFAA+DejaVuSansCondensed-Bold
', '/MPDFAA+DejaVuSansCondensed-BoldOblique', '/MPDFAA+DejaVuSans
Condensed-Oblique', '/ZapfDingbats']

Unembedded Fonts
set(['/MPDFAA+DejaVuSansCondensed-Bold', '/ZapfDingbats', '/MPDFA
A+DejaVuSansCondensed-BoldOblique', '/MPDFAA+DejaVuSansCondensed'
, '/MPDFAA+DejaVuSansCondensed-Oblique'])

问题是这些-有没有办法从原始PDF中提取嵌入字体并将其嵌入到新pdf中；是不是我做的不对导致图像无法嵌入？

Answer 1

经过一些测试，我发现问题不在于生成的 PDF，而是返回 PDF 作为响应。如果我将我的 PDF 保存到存储桶并从 AWS CLI 下载它，它就可以工作。我没有弄清楚如何修复响应以将 PDF 正确发送回前端。

Answer 2

迟到的答案，但其他人可能会觉得有帮助

参考：https://www.edureka.co/community/179150/download-pdf-file-using-jquery-ajax

$.ajax({
        type: 'POST',
        url : pdf_report,
        data: data,
        contentType: false,
        processData: false,
        xhr: function () {

            var xhr = new XMLHttpRequest();

            xhr.onreadystatechange = function () {

                if (xhr.readyState == 2) {

                    if (xhr.status == 200) {

                        xhr.responseType = "blob";

                    } else {

                        xhr.responseType = "text";

                    }

                }

            };

            return xhr;

        },
        
        success: function(data)
        {   
            
             //Convert the Byte Data to BLOB object.
            var blob = new Blob([data], { type: "application/octetstream" });
             var fileName = "yourfile.pdf"
             
             
             //Check the Browser type and download the File.
             var isIE = false || !!document.documentMode;
             if (isIE) {
                 window.navigator.msSaveBlob(blob, fileName);
             } else {
                 var url = window.URL || window.webkitURL;
                 link = url.createObjectURL(blob);
                 var a = $("<a />");
                 a.attr("download", fileName);
                 a.attr("href", link);
                 $("body").append(a);
                 a[0].click();
                 $("body").remove(a);
             }
        }
    });
}

使用 reportlab 和 PyPDF2 合并 PDF 会丢失图像和嵌入字体

问题描述投票：0回答：2

2个回答

最新问题

使用 reportlab 和 PyPDF2 合并 PDF 会丢失图像和嵌入字体

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2