使用Pandoc和LaTeX从docx转换为pdf时如何解决“字符丢失”警告？

Question

目标

我有数千个高棉语.docx文件，并希望使用.pdf将它们转换为Pandoc格式。

背景

我使用MacPorts安装了Pandoc。 Pandoc需要LaTeX进行PDF转换，因此我安装了MacTeX。安装似乎已正确进行，并且我能够轻松地将英语.docx文件转换为.pdf。

尝试1

[当我尝试将高棉语文件（您可以在https://briancroxall.net/pandoc/transcription.docx中找到示例）转换为PDF时，我使用以下命令：

pandoc transcription.docx  -s -o transcript.pdf

我收到以下错误：

Error producing PDF.
! Package inputenc Error: Unicode character អ (U+17A2)
(inputenc)                not set up for use with LaTeX.

See the inputenc package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              

l.64 ...�នៅសម័យប៉ុល ពត។}

Try running pandoc with --pdf-engine=xelatex.

尝试2

按照此建议，我使用此命令：

pandoc --pdf-engine=xelatex transcription.docx  -s -o transcript.pdf

Pandoc然后为文本中的每个高棉字符抛出错误消息：

[WARNING] Missing character: There is no អ in font [lmroman10-bold]:mapping=tex-text;!
[WARNING] Missing character: There is no ្ in font [lmroman10-bold]:mapping=tex-text;!
[WARNING] Missing character: There is no ន in font [lmroman10-bold]:mapping=tex-text;!
...

通过此过程生成PDF（请参阅https://briancroxall.net/pandoc/transcript.pdf），但基本上是空的。

问题

据我所知，这表明我试图用来进行转换的LaTeX引擎中没有高棉字符。是否如此，如何成功管理此文件转换？

Answer 1

mb21的评论帮助我弄清楚了。由于我的系统安装了两种高棉字体，因此我必须设置mainfont以使用其中的一种。

$ pandoc --pdf-engine=xelatex transcription.docx \ -V 'mainfont:Khmer MN' -s -o transcription.pdf

这将生成带有高棉语字符且没有错误消息的PDF。

PDF does似乎存在一些问题，因为高棉语中的某些短语超出了页面的边缘。我认为这是由于Word可以处理的细分问题，但在转换为PDF时变得一团糟。

使用Pandoc和LaTeX从docx转换为pdf时如何解决“字符丢失”警告？

问题描述投票：1回答：1

目标

背景

尝试1

尝试2

问题

1个回答

最新问题

使用Pandoc和LaTeX从docx转换为pdf时如何解决“字符丢失”警告？

问题描述 投票：1回答：1

目标

背景

尝试1

尝试2

问题

1个回答

最新问题

问题描述投票：1回答：1