如何从命令行将 PDF 转换为灰度以避免被光栅化?

问题描述 投票:0回答:9

我正在尝试将此 PDF 转换为灰度:https://dl.dropboxusercontent.com/u/10351891/page-27.pdf

带有 pdfwrite 设备的 Ghostscript (v 9.10) 失败并显示“无法将色彩空间转换为灰色,将策略恢复为 LeaveColorUnchanged”。消息。

我可以通过中间 ps 文件(使用 gs、pdftops (v 0.24.3) 或 pdf2ps)转换它,但这种转换会光栅化整个 PDF。 我尝试了很多其他的东西:使用 qpdf (v 5.0.1) 或 pdftk (v 1.44) 将 PDF 标准化,将其转换为 svg 文件并通过 Inkscape (v 0.48.4) 转换回 PDF……似乎什么都没有上班。

我找到的唯一一个解决方案(在生产环境中不适合我)是在我的 Mac 上使用 Preview 并手动或使用 Automator 脚本应用 Quartz Gray Tone 滤镜。

有人找到另一种工作方式吗? 或者是否可以规范化 PDF 或修复问题以防止 Ghostscript 消息“无法转换颜色空间...”或以其他方式强制使用颜色空间?

谢谢!

pdf ghostscript grayscale pdf-conversion pdf-manipulation
9个回答
60
投票
gs \
   -sDEVICE=pdfwrite \
   -sProcessColorModel=DeviceGray \
   -sColorConversionStrategy=Gray \
   -dOverrideICC \
   -o out.pdf \
   -f page-27.pdf

此命令将您的文件转换为灰度(GS 9.10)。


11
投票

今天有点晚了,但最佳答案对我来说对不同的文件不起作用。潜在的问题似乎是 Ghostscript 中的旧代码,默认情况下未启用更高版本。更多相关信息:http://bugs.ghostscript.com/show_bug.cgi?id=694608

上面的页面还给出了一个对我有用的命令:

gs \
  -sDEVICE=pdfwrite \
  -dProcessColorModel=/DeviceGray \
  -dColorConversionStrategy=/Gray \
  -dPDFUseOldCMS=false \
  -o out.pdf \
  -f in.pdf

3
投票

使用最新的代码(尚未发布)并设置 ColorConversionStrategy=Gray


2
投票

如果你破解文件,你会发现大部分颜色都是通过基于 RGB ICC 的颜色空间确定的(查找

8 0 R
以找到对该颜色空间的所有引用)。也许 gs 正在抱怨这个?

谁知道。

要点是,在不影响内容的情况下将页面从一种颜色空间转换为另一种颜色空间并非易事,因为您需要能够呈现页面并将所有更改捕获到当前颜色/颜色空间并替换目标中的等效项空间以及转换错误颜色空间中的所有图像 XObjects,这将需要解码图像数据并在目标空间中重新编码它,以及所有形式的 XObjects,这将是一项类似于尝试转换父页面的任务因为表单 XObjects(我认为您的文档有 4 个)还包含资源和页面标记运算符的内容流(可能包含更多 XObjects)。

当然可行,但过程与渲染几乎相同,但有一些相当特殊用途的代码。


1
投票

响应很晚,但以下命令应该有效:

convert -colorspace GRAY input.pdf input_gray.pdf

1
投票

在 Linux 中:

安装 pdftk

apt-get install pdftk

安装 pdftk 后,使用以下代码将脚本保存为 graypdf.sh

# convert pdf to grayscale, preserving metadata
# "AFAIK graphicx has no feature for manipulating colorspaces. " http://groups.google.com/group/latexusersgroup/browse_thread/thread/5ebbc3ff9978af05
# "> Is there an easy (or just standard) way with pdflatex to do a > conversion from color to grayscale when a PDF file is generated? No." ... "If you want to convert a multipage document then you better have pdftops from the xpdf suite installed because Ghostscript's pdf to ps doesn't produce nice Postscript." http://osdir.com/ml/tex.pdftex/2008-05/msg00006.html
# "Converting a color EPS to grayscale" - http://en.wikibooks.org/wiki/LaTeX/Importing_Graphics
# "\usepackage[monochrome]{color} .. I don't know of a neat automatic conversion to monochrome (there might be such a thing) although there was something in Tugboat a while back about mapping colors on the fly. I would probably make monochrome versions of the pictures, and name them consistently. Then conditionally load each one" http://newsgroups.derkeiler.com/Archive/Comp/comp.text.tex/2005-08/msg01864.html
# "Here comes optional.sty. By adding \usepackage{optional} ... \opt{color}{\includegraphics[width=0.4\textwidth]{intro/benzoCompounds_color}} \opt{grayscale}{\includegraphics[width=0.4\textwidth]{intro/benzoCompounds}} " - http://chem-bla-ics.blogspot.com/2008/01/my-phd-thesis-in-color-and-grayscale.html
# with gs:
# http://handyfloss.net/2008.09/making-a-pdf-grayscale-with-ghostscript/
# note - this strips metadata! so:
# http://etutorials.org/Linux+systems/pdf+hacks/Chapter+5.+Manipulating+PDF+Files/Hack+64+Get+and+Set+PDF+Metadata/
COLORFILENAME=$1
OVERWRITE=$2
FNAME=${COLORFILENAME%.pdf}
# NOTE: pdftk does not work with logical page numbers / pagination;
# gs kills it as well;
# so check for existence of 'pdfmarks' file in calling dir;
# if there, use it to correct gs logical pagination
# for example, see
# http://askubuntu.com/questions/32048/renumber-pages-of-a-pdf/65894#65894
PDFMARKS=
if [ -e pdfmarks ] ; then
PDFMARKS="pdfmarks"
echo "$PDFMARKS exists, using..."
# convert to gray pdf - this strips metadata!
gs -sOutputFile=$FNAME-gs-gray.pdf -sDEVICE=pdfwrite \
-sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH "$COLORFILENAME" "$PDFMARKS"
else # not really needed ?!
gs -sOutputFile=$FNAME-gs-gray.pdf -sDEVICE=pdfwrite \
-sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH "$COLORFILENAME"
fi
# dump metadata from original color pdf
## pdftk $COLORFILENAME dump_data output $FNAME.data.txt
# also: pdfinfo -meta $COLORFILENAME
# grep to avoid BookmarkTitle/Level/PageNumber:
pdftk $COLORFILENAME dump_data output | grep 'Info\|Pdf' > $FNAME.data.txt
# "pdftk can take a plain-text file of these same key/value pairs and update a PDF's Info dictionary to match. Currently, it does not update the PDF's XMP stream."
pdftk $FNAME-gs-gray.pdf update_info $FNAME.data.txt output $FNAME-gray.pdf
# (http://wiki.creativecommons.org/XMP_Implementations : Exempi ... allows reading/writing XMP metadata for various file formats, including PDF ... )
# clean up
rm $FNAME-gs-gray.pdf
rm $FNAME.data.txt
if [ "$OVERWRITE" == "y" ] ; then
echo "Overwriting $COLORFILENAME..."
mv $FNAME-gray.pdf $COLORFILENAME
fi
# BUT NOTE:
# Mixing TEX & PostScript : The GEX Model - http://www.tug.org/TUGboat/Articles/tb21-3/tb68kost.pdf
# VTEX is a (commercial) extended version of TEX, sold by MicroPress, Inc. Free versions of VTEX have recently been made available, that work under OS/2 and Linux. This paper describes GEX, a fast fully-integrated PostScript interpreter which functions as part of the VTEX code-generator. Unless specified otherwise, this article describes the functionality in the free- ware version of the VTEX compiler, as available on CTAN sites in systems/vtex.
# GEX is a graphics counterpart to TEX. .. Since GEX may exercise subtle influence on TEX (load fonts, or change TEX registers), GEX is op- tional in VTEX implementations: the default oper- ation of the program is with GEX off; it is enabled by a command-line switch.
# \includegraphics[width=1.3in, colorspace=grayscale 256]{macaw.jpg}
# http://mail.tug.org/texlive/Contents/live/texmf-dist/doc/generic/FAQ-en/html/FAQ-TeXsystems.html
# A free version of the commercial VTeX extended TeX system is available for use under Linux, which among other things specialises in direct production of PDF from (La)TeX input. Sadly, it���s no longer supported, and the ready-built images are made for use with a rather ancient Linux kernel.
# NOTE: another way to capture metadata; if converting via ghostscript:
# http://compgroups.net/comp.text.pdf/How-to-specify-metadata-using-Ghostscript
# first:
# grep -a 'Keywo' orig.pdf
# /Author(xxx)/Title(ttt)/Subject()/Creator(LaTeX)/Producer(pdfTeX-1.40.12)/Keywords(kkkk)
# then - copy this data in a file prologue.ini:
#/pdfmark where {pop} {userdict /pdfmark /cleartomark load put} ifelse
#[/Author(xxx)
#/Title(ttt)
#/Subject()
#/Creator(LaTeX with hyperref package + gs w/ prologue)
#/Producer(pdfTeX-1.40.12)
#/Keywords(kkkk)
#/DOCINFO pdfmark
#
# finally, call gs on the orig file,
# asking to process pdfmarks in prologue.ini:
# gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
# -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -dDOPDFMARKS \
# -sOutputFile=out.pdf in.pdf prologue.ini
# then the metadata will be in output too (which is stripped otherwise;
# note bookmarks are preserved, however). 

赋予文件执行权限

chmod +x greypdf.sh

然后像这样执行它:

./greypdf.sh input.pdf

它将在与初始文件相同的位置创建一个文件 input-gray.pdf


1
投票

gs -dQUIET -dBATCH -dNOPAUSE -r150 -sDEVICE=pdfwrite -sProcessColorModel=DeviceGray -sColorConversionStrategy=Gray -dOverrideICC -sOutputFile=output.pdf input.pdf


0
投票

你可以使用我创建的东西。它使您可以选择要转换为灰度的特定页码。如果您不想对整个 pdf 进行灰度化,这很方便。 https://github.com/shoaibkhan94/PdfGrayscaler


0
投票

使用

mutool recolor
更改PDF文件的色彩空间。

mutool recolor -c gray -o output.pdf input.pdf

Allowed colorspace:

gray
(default),
rgb
and
cmyk
.

mutool recolor
MuPDF 包的一部分,因为版本 1.22.1.

© www.soinside.com 2019 - 2024. All rights reserved.