我是 Web 开发新手,正在尝试在 Django 中创建一个语言翻译应用程序来翻译上传的文档。它依赖于pdf和docx之间的一系列相互转换。当我的代码输出翻译后的文档时,它无法打开。
当我检查文件类型时,我看到它被标识为XML和docx,当我将扩展名更改为docx时,它可以被MS Word打开和阅读(但任何PDF阅读器都无法读取它)。
当我使用我的代码 python 通过打印类型和内容来分析文件时,我得到了 NoneType 和 None。
在 mysite/mysite 文件夹中找到该文件的工作 PDF,但发送到浏览器的 reConverter 函数输出的文件是问题文件。
我尝试使用手动转换它:
wordObj = win32com.client.Dispatch('Word.Application')
docObj = wordObj.Documents.Open(wordFilename)
docObj.SaveAs(pdfFilename, FileFormat=wdFormatPDF)
docObj.Close()
wordObj.Quit()
但出现 CoInitialization 错误。我原来的 因此,我已将范围完全缩小到返回 NoneType 的 reConverter 函数。 这是我的代码:
from django.shortcuts import render
from django.http import HttpResponse
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_protect
from .models import TranslatedDocument
from .forms import UploadFileForm
from django.core.files.storage import FileSystemStorage
import docx
from pdf2docx import parse
from docx2pdf import convert
import time #remove
# Create your views here.
#pythoncom.CoInitialize()
@csrf_protect
def translateFile(request) :
if request.method == 'POST':
form = UploadFileForm(request.POST, request.FILES)
if form.is_valid():
uploaded_file = request.FILES['file']
fs = FileSystemStorage()
filename = fs.save(uploaded_file.name, uploaded_file)
uploaded_file_path = fs.path(filename)
file = (converter(uploaded_file_path))
response = HttpResponse(file, content_type='application/pdf')
response['Content-Disposition'] = 'attachment; filename="' + filename + '"'
return response
else:
form = UploadFileForm()
return render(request, 'translator_upload/upload.html', {'form': form})
def reConverter(inputDocxPath):
#reconvert docx to pdf
print('reConverter: '+str(inputDocxPath))
outputPdfPath = inputDocxPath.replace('.docx', '.pdf')
test = convert(inputDocxPath, outputPdfPath)
print(type(test))
print('test: '+str(test))
return test
def translateDocx(aDocx, stringOfDocPath):
#translation logic
docx_file = stringOfDocPath
myDoc = docx.Document(docx_file)
print('translateDocx: '+str(docx_file))
print('translateDocx: '+str(myDoc))
for paragraphNum in range(len(myDoc.paragraphs)):
#TRANSLATION LOGIC
myDoc.save(docx_file)
return reConverter(docx_file)
#stringOfDocPath is used as convert() requires file path, not file object(myDoc)
def converter(inputPdfPath):
# convert pdf to docx
pdf_file = inputPdfPath
docx_file = inputPdfPath.replace('.pdf', '.docx')
print('file types saved: '+docx_file+'. Converting to docx')
parse(pdf_file, docx_file) #, start=0, end=3)
myDoc = docx.Document(docx_file)
print('converter '+str(myDoc))
return translateDocx(myDoc, docx_file)
docx2pdf.convert 始终返回“无”
转换后的 pdf 文件将保存到“outputPdfPath”文件中。
为了向用户显示 pdf 文件,您必须从“outputPdfPath”读取 pdf 文件。
def reConverter(inputDocxPath):
#reconvert docx to pdf
print('reConverter: '+str(inputDocxPath))
outputPdfPath = inputDocxPath.replace('.docx', '.pdf')
convert(inputDocxPath, outputPdfPath)
with open(outputPdfPath, "r") as f:
test = f.read()
print(type(test))
print('test: '+str(test))
return test