如何在Databricks（linux）中将pptx文件转换为pdf文件，保持字体颜色和图像

Question

我正在尝试在Databricks中将pptx文件转换为pdf，保持原始pptx文件的设计和结构。根据我在网上找到的信息，有一些方法可以在 Windows 上使用 comtypes 和 win32 库来做到这一点，但由于 Databricks 是构建在 Linux 之上的，因此无法使用这些库。因此，为此我使用了 python-pptx 和 reportlab 并成功将 pptx 文件转换为 pdf。

问题是：

输出文件仅包含文本，没有图像，没有颜色（不知道如何使用 python-pptx 库做到这一点）
输出文件不包含形状中的文本
一页上多张幻灯片的文本。

这是我的代码实现：

from pptx import Presentation
from io import BytesIO
from reportlab.lib.pagesizes import letter
from reportlab.lib import colors
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer


pptx_file_path = '/dbfs/input.pptx' 
presentation = Presentation(pptx_file_path)


slide_contents = []
for slide in presentation.slides:
    slide_text = ""
    for shape in slide.shapes:
        if hasattr(shape, "text"):
            slide_text += shape.text + "\n"
    slide_contents.append(slide_text)

# Create a PDF using reportlab
pdf_file_path = '/dbfs/output.pdf'
doc = SimpleDocTemplate(pdf_file_path, pagesize=letter)
styles = getSampleStyleSheet()
elements = []
for content in slide_contents:
    elements.append(Paragraph(content, styles['Normal']))
    elements.append(Spacer(1, 12))
doc.build(elements)

如何在Databricks（linux）中将pptx文件转换为pdf文件，保持字体颜色和图像

问题描述投票：0回答：0

最新问题

如何在Databricks（linux）中将pptx文件转换为pdf文件，保持字体颜色和图像

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0