在 Google Colab 中安装 Poppler 进行深度检测

问题描述 投票:0回答:1

我想运行来自 https://github.com/deepdoctection/deepdoctection Colab 中的自述文件的示例代码:

import deepdoctection as dd
from IPython.core.display import HTML
from matplotlib import pyplot as plt

analyzer = dd.get_dd_analyzer()  # instantiate the built-in analyzer similar to the Hugging Face space demo

df = analyzer.analyze(path = "/path/to/your/doc.pdf")  # setting up pipeline
df.reset_state()                 # Trigger some initialization

doc = iter(df)
page = next(doc) 

image = page.viz()
plt.figure(figsize = (25,17))
plt.axis('off')
plt.imshow(image)

但我明白了:

/usr/local/lib/python3.10/dist-packages/deepdoctection/utils/pdf_utils.py in _input_to_cli_str(input_file_name, output_file_name, dpi, size)
    160         command = "pdftocairo"
    161     else:
--> 162         raise PopplerNotFound("Poppler not found. Please install or add to your PATH.")
    163 
    164     if platform.system() == "Windows":

PopplerNotFound: Poppler not found. Please install or add to your PATH.

我已经尝试了这个问题和其他一些问题的选项,但它们没有改变任何东西。

python pdf google-colaboratory ocr poppler
1个回答
0
投票

很高兴回答您的问题! :D

您遇到的错误表明

deepdoctection
库需要在您的系统上安装Poppler(一种PDF渲染工具),但它找不到它。要在 Google Colab 环境中解决此问题,您可以按照以下步骤操作:

  1. 在 Colab 中安装 Poppler:
!apt-get install -y poppler-utils
  1. 将 Poppler 添加到路径:
import os
os.environ['PATH'] += ":/usr/bin/"
  1. 运行示例代码:
import deepdoctection as dd
from IPython.core.display import HTML
from matplotlib import pyplot as plt
import os

# Install Poppler
!apt-get install -y poppler-utils

# Add Poppler to PATH
os.environ['PATH'] += ":/usr/bin/"

analyzer = dd.get_dd_analyzer()  # instantiate the built-in analyzer similar to the Hugging Face space demo

# Use a sample PDF file for testing (replace with your actual path)
pdf_path = "/content/sample.pdf"

df = analyzer.analyze(path=pdf_path)  # setting up pipeline
df.reset_state()  # Trigger some initialization

doc = iter(df)
page = next(doc)

image = page.viz()
plt.figure(figsize=(25, 17))
plt.axis('off')
plt.imshow(image)

此代码安装 Poppler,将其添加到 PATH,然后运行示例代码。如果您仍然遇到问题,请确保 PDF 文件路径正确,并且可以从 Colab 环境访问 PDF 文件。

© www.soinside.com 2019 - 2024. All rights reserved.