尝试将PDF解析为文本,并一直在尝试从Slate入手。
但是,只要按照随处发布的基本示例,我得到以下结果:
>>> import slate
>>> with open('pytest.PDF') as fp:
... doc = slate.PDF(fp)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/slate/slate.py", line 52, in __init__
self.append(self.interpreter.process_page(page))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/slate/slate.py", line 36, in process_page
self.device.outfp.buf = ''
AttributeError: 'cStringIO.StringO' object has no attribute 'buf'
有什么想法吗?
可以通过将发生错误的第 36 行更改为:
来修复此问题self.device.outfp.truncate(0)