该问题与使用LibreOffice headless转换器自动转换上传的文件有关。出现此错误:
LibreOffice 7 fatal error - Application cannot be started
Ubuntu 版本:21.04
我尝试过的: 从 Azure Blob 存储获取文件, 将其放入 BASE_DIR/Input_file 中, 使用我通过子进程运行的 Linux 命令将其转换为 PDF, 将其放入 BASE_DIR/Output_file 文件夹中。
下面是我的代码:
我正在通过这种方式将 LibreOffice 安装到 docker
RUN apt-get update \
&& ACCEPT_EULA=Y apt-get -y install LibreOffice
主要逻辑:
blob_client = container_client.get_blob_client(f"Folder_with_reports/")
with open(os.path.join(BASE_DIR, f"input_files/{filename}"), "wb") as source_file:
source_file.write(data)
source_file = os.path.join(BASE_DIR, f"input_files/{filename}") # original docs here
output_folder = os.path.join(BASE_DIR, "output_files") # pdf files will be here
# assign the command of converting files through LibreOffice
command = rf"lowriter --headless --convert-to pdf {source_file} --outdir {output_folder}"
# running the command
subprocess.run(command, shell=True)
# reading the file and uploading it back to Azure Storage
with open(os.path.join(BASE_DIR, f"output_files/MyFile.pdf"), "rb") as outp_file:
outp_data = outp_file.read()
blob_name_ = f"test"
container_client.upload_blob(name = blob_name_ ,data = outp_data, blob_type="BlockBlob")
我应该安装 lowriter 而不是 LibreOffice 吗?对于这种操作可以使用 BASE_DIR 吗?我将不胜感激任何建议。
部分解决方案:
这里我简化了情况,并使用此 Dockerfile 创建了额外的 docker 映像。 我应用了两种方法:unoconv 和直接转换。
Dockerfile:
FROM ubuntu:21.04
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get -y upgrade && \
apt-get -y install python3.10 && \
apt update && apt install python3-pip -y
# Method1 - installing LibreOffice and java
RUN apt-get --no-install-recommends install libreoffice -y
RUN apt-get install -y libreoffice-java-common
# Method2 - additionally installing unoconv
RUN apt-get install unoconv
ARG CACHEBUST=1
ADD BASE.py /code/BASE.py
# copying input doc/docx files to the docker's linux
COPY /input_files /code/input_files
CMD ["/code/BASE.py"]
ENTRYPOINT ["python3"]
BASE.py
import os
import subprocess
BASE_DIR = "/code"
# subprocess.run("ls code/input_files", shell=True)
for filename in os.listdir('code/input_files'):
source_file = f"/code/input_files/{filename}" # original document
output_filename = os.path.splitext(filename)[0]+".pdf"
output_file = f"code/output_files/{output_filename}"
output_folder = "code/output_files" # pdf files will be here
# METHOD 1 - LibreOffice straightly
assign the command of converting files through LibreOffice
convert_to_pdf = rf"libreoffice --headless --convert-to pdf {source_file} --outdir {output_folder}"
subprocess.run(r'ls code/output_files/', shell=True)
## METHOD 2 - Using unoconv - also working
# convert_to_pdf = f"unoconv -f pdf {source_file}"
# subprocess.run(convert_to_pdf, shell=True)
# print(f'file {filename} converted')
如果文件在构建时已经在 Linux 文件系统中,则上述方法可以解决该问题。但在构建docker镜像后仍然没有找到将文件写入系统的方法。
我创建了类似的东西。它是一个 API,使用 unoserver 和 libreoffice 将文件转换为图像以用于预览/缩略图。请看这里:https://github.com/Nowi5/file-preview-api