无法在AWS Lambda上安装Tesseract 5.0版本

问题描述 投票:0回答:2

我想在我的 AWS Lambda 函数上运行 Tesseract 4.0 或 Tesseract 5.0。所以我有我的 docker 文件,就像这样-

FROM public.ecr.aws/lambda/python:3.8

RUN mkdir app

# Copy function code
COPY / ${LAMBDA_TASK_ROOT}/app


# Install the function's dependencies using file requirements.txt
# from your project folder.

COPY requirements.txt  .
RUN  pip3 install -r requirements.txt --target ${LAMBDA_TASK_ROOT}

RUN rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
RUN yum -y update
RUN yum -y install tesseract

RUN yum install -y poppler-utils

# Set the CMD to your handler (could also be done as a parameter override outside of the Dockerfile)
CMD [ "app.com.emlAndMsgParser.mail_parser_test.getEmail_from_msg" ]

但是当我执行 DockerBuild 时,“docker build -t qa-lambda ”。在我的终端上,它显示 Tesseract 3.0 版本正在安装。当我将这个构建的 Docker 映像部署到 AWS Lambda 时,它还表示已安装 Tesseract 3.0。 但我想要 Tesseract 4.0 或者最好是 Tesserct 5.0。 我尝试将 Dockerfile 中的“RUN yum -y install tesseract”更改为“RUN yum -y install tesseract 5.0.0-alpha-320-g8dc3”和“RUN yum -y install tesseract -y”或“RUN yum -y install tesseract*”。 但他们都在安装 Tesseract 3.0。 请问谁能告诉我哪里出错了? 我对 Tesseract 有点陌生,所以感谢任何帮助..谢谢!

amazon-web-services docker ocr tesseract python-tesseract
2个回答
3
投票

遇到同样的问题,我终于自己创建了一个

Dockerfile

FROM public.ecr.aws/lambda/java:11 q

# Prepare dev tools
RUN yum -y update
RUN yum -y install wget libstdc++ autoconf automake libtool autoconf-archive pkg-config gcc gcc-c++ make libjpeg-devel libpng-devel libtiff-devel zlib-devel
RUN yum group install -y "Development Tools"

# Build leptonica
WORKDIR /opt
RUN wget http://www.leptonica.org/source/leptonica-1.82.0.tar.gz
RUN ls -la
RUN tar -zxvf leptonica-1.82.0.tar.gz
WORKDIR ./leptonica-1.82.0
RUN ./configure
RUN make -j
RUN make install
RUN cd .. && rm leptonica-1.82.0.tar.gz

# Build tesseract
RUN wget https://github.com/tesseract-ocr/tesseract/archive/refs/tags/5.2.0.tar.gz
RUN tar -zxvf 5.2.0.tar.gz
WORKDIR ./tesseract-5.2.0
RUN ./autogen.sh
RUN PKG_CONFIG_PATH=/usr/local/lib/pkgconfig LIBLEPT_HEADERSDIR=/usr/local/include ./configure --with-extra-includes=/usr/local/include --with-extra-libraries=/usr/local/lib
RUN LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make -j
RUN make install
RUN /sbin/ldconfig
RUN cd .. && rm 5.2.0.tar.gz

# Optional: install language packs
RUN wget https://github.com/tesseract-ocr/tessdata/raw/main/deu.traineddata
RUN wget https://github.com/tesseract-ocr/tessdata/raw/main/eng.traineddata
RUN mv *.traineddata /usr/local/share/tessdata

WORKDIR /root

ENTRYPOINT [ "tesseract", "--version" ]

希望这有帮助!


0
投票

2023 年 11 月 - 要减小 Docker 映像的大小,请使用多阶段构建。

类似于@GiehlMan。

FROM public.ecr.aws/lambda/python:3.10 as builder

# Prepare dev tools
RUN yum -y update
RUN yum -y install wget libstdc++ autoconf automake libtool autoconf-archive pkg-config gcc gcc-c++ make libjpeg-devel libpng-devel libtiff-devel zlib-devel
RUN yum group install -y "Development Tools"

# Build leptonica
WORKDIR /opt
RUN wget http://www.leptonica.org/source/leptonica-1.82.0.tar.gz
RUN ls -la
RUN tar -zxvf leptonica-1.82.0.tar.gz
WORKDIR ./leptonica-1.82.0
RUN ./configure
RUN make -j
RUN make install
RUN cd .. && rm leptonica-1.82.0.tar.gz

# Build tesseract
RUN wget https://github.com/tesseract-ocr/tesseract/archive/refs/tags/5.2.0.tar.gz
RUN tar -zxvf 5.2.0.tar.gz
WORKDIR ./tesseract-5.2.0
RUN ./autogen.sh
RUN PKG_CONFIG_PATH=/usr/local/lib/pkgconfig LIBLEPT_HEADERSDIR=/usr/local/include ./configure --with-extra-includes=/usr/local/include --with-extra-libraries=/usr/local/lib
RUN make install
RUN /sbin/ldconfig
RUN cd .. && rm 5.2.0.tar.gz

# install language packs
RUN wget https://github.com/tesseract-ocr/tessdata/raw/main/eng.traineddata
RUN mv *.traineddata /usr/local/share/tessdata

FROM public.ecr.aws/lambda/python:3.10

# Copy necessary files from the builder stage
COPY --from=builder /usr/local/bin/tesseract /usr/local/bin/tesseract
COPY --from=builder /usr/local/share/tessdata /usr/local/share/tessdata
COPY --from=builder /usr/local/lib/libtesseract* /usr/local/lib/
COPY --from=builder /usr/local/lib/liblept* /usr/local/lib/

# Additional dependencies for Tesseract
COPY --from=builder /usr/lib64/libjpeg.so.62 /usr/lib64/libjpeg.so.62
COPY --from=builder /usr/lib64/libjbig.so.2.0 /usr/lib64/libjbig.so.2.0
COPY --from=builder /usr/lib64/libtiff.so.5 /usr/lib64/libtiff.so.5
COPY --from=builder /usr/lib64/libgomp.so.1 /usr/lib64/libgomp.so.1

ENV PATH="/usr/local/bin:${PATH}"
ENV LD_LIBRARY_PATH="/usr/local/lib:/usr/lib64:${LD_LIBRARY_PATH}"

RUN tesseract --version

# lambda handler
COPY requirements.txt ./

RUN pip install --upgrade pip wheel
RUN pip install -r requirements.txt -t .

COPY handler.py ./

CMD ["app.lambda_handler"]
© www.soinside.com 2019 - 2024. All rights reserved.