为什么我的 Docx 转换器返回“无”

问题描述 投票:0回答:1

我是 Web 开发新手,正在尝试在 Django 中创建一个语言翻译应用程序来翻译上传的文档。它依赖于pdf和docx之间的一系列相互转换。当我的代码输出翻译后的文档时,它无法打开。

  1. 当我检查文件类型时,我看到它被标识为XML和docx,当我将扩展名更改为docx时,它可以被MS Word打开和阅读(但任何PDF阅读器都无法读取它)。

  2. 当我使用我的代码 python 通过打印类型和内容来分析文件时,我得到了 NoneType 和 None。

  3. 在 mysite/mysite 文件夹中找到该文件的工作 PDF,但发送到浏览器的 reConverter 函数输出的文件是问题文件。

  4. 我尝试使用手动转换它:

wordObj = win32com.client.Dispatch('Word.Application')
docObj = wordObj.Documents.Open(wordFilename)
docObj.SaveAs(pdfFilename, FileFormat=wdFormatPDF)
docObj.Close()
wordObj.Quit()

但出现 CoInitialization 错误。我原来的 因此,我已将范围完全缩小到返回 NoneType 的 reConverter 函数。 这是我的代码:

from django.shortcuts import render
from django.http import HttpResponse
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_protect
from .models import TranslatedDocument
from .forms import UploadFileForm
from django.core.files.storage import FileSystemStorage
import docx
from pdf2docx import parse
from docx2pdf import convert
import time #remove


# Create your views here.

#pythoncom.CoInitialize()
@csrf_protect
def translateFile(request) :
    if request.method == 'POST':
        form = UploadFileForm(request.POST, request.FILES)
        if form.is_valid():
            uploaded_file = request.FILES['file']
            fs = FileSystemStorage()
            filename = fs.save(uploaded_file.name, uploaded_file)
            uploaded_file_path = fs.path(filename)

            file = (converter(uploaded_file_path))
        response = HttpResponse(file, content_type='application/pdf')
        response['Content-Disposition'] = 'attachment; filename="' + filename + '"'
        return response
    
    else:
        form = UploadFileForm()
    return render(request, 'translator_upload/upload.html', {'form': form})


def reConverter(inputDocxPath):
    #reconvert docx to pdf
    
    print('reConverter: '+str(inputDocxPath))
    outputPdfPath = inputDocxPath.replace('.docx', '.pdf')
    test = convert(inputDocxPath, outputPdfPath)
    print(type(test))
    print('test: '+str(test))
    return test

def translateDocx(aDocx, stringOfDocPath):
    #translation logic
    docx_file = stringOfDocPath
    myDoc = docx.Document(docx_file)
    print('translateDocx: '+str(docx_file))
    print('translateDocx: '+str(myDoc))
    for paragraphNum in range(len(myDoc.paragraphs)):

    #TRANSLATION LOGIC


    myDoc.save(docx_file)
    return reConverter(docx_file)


    
#stringOfDocPath is used as convert() requires file path, not file object(myDoc)

def converter(inputPdfPath):
    # convert pdf to docx
    
    pdf_file = inputPdfPath
    docx_file = inputPdfPath.replace('.pdf', '.docx')
    print('file types saved: '+docx_file+'. Converting to docx')


    parse(pdf_file, docx_file) #,  start=0, end=3)
    myDoc = docx.Document(docx_file)
    print('converter '+str(myDoc))
    return translateDocx(myDoc, docx_file)
python django pdf ms-word pywin32
1个回答
0
投票

docx2pdf.convert 始终返回“无”

转换后的 pdf 文件将保存到“outputPdfPath”文件中。

为了向用户显示 pdf 文件,您必须从“outputPdfPath”读取 pdf 文件。

def reConverter(inputDocxPath):
    #reconvert docx to pdf
    
    print('reConverter: '+str(inputDocxPath))
    outputPdfPath = inputDocxPath.replace('.docx', '.pdf')
    convert(inputDocxPath, outputPdfPath)
    with open(outputPdfPath, "r") as f:
        test = f.read()
    print(type(test))
    print('test: '+str(test))
    return test
© www.soinside.com 2019 - 2024. All rights reserved.