如何在 docker 容器中下载带有适当安全证书的 NLTK 包?

问题描述 投票:0回答:0

我已经尝试了here和其他地方提到的所有组合,但我一直遇到同样的错误。

这是我的

Dockerfile

FROM python:3.9

RUN pip install virtualenv && virtualenv venv -p python3
ENV VIRTUAL_ENV=/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt

RUN git clone https://github.com/facebookresearch/detectron2.git
RUN python -m pip install -e detectron2

# Install dependencies
RUN apt-get update && apt-get install libgl1 -y
RUN pip install -U nltk
RUN [ "python3", "-c", "import nltk; nltk.download('punkt', download_dir='/usr/local/nltk_data')" ]

COPY . /app

# Run the application:
CMD ["python", "-u", "app.py"]

docker 图像构建良好(我正在使用平台参数,因为我正在构建要在 Linux 中运行的图像,但是我正在构建图像的本地机器是 Windows,

detectron
库没有在 Windows 中安装):

>>> docker buildx build --platform=linux/amd64 -t my_app .
[+] Building 23.2s (16/16) FINISHED
 => [internal] load .dockerignore                                                                                  0.0s
 => => transferring context: 2B                                                                                    0.0s
 => [internal] load build definition from Dockerfile                                                               0.0s
 => => transferring dockerfile: 634B                                                                               0.0s
 => [internal] load metadata for docker.io/library/python:3.9                                                      0.9s
 => [internal] load build context                                                                                  0.0s
 => => transferring context: 1.85kB                                                                                0.0s
 => [ 1/11] FROM docker.io/library/python:3.9@sha256:6ea9dafc96d7914c5c1d199f1f0195c4e05cf017b10666ca84cb7ce8e269  0.0s
 => CACHED [ 2/11] RUN pip install virtualenv && virtualenv venv -p python3                                        0.0s
 => CACHED [ 3/11] WORKDIR /app                                                                                    0.0s
 => CACHED [ 4/11] COPY requirements.txt ./                                                                        0.0s
 => CACHED [ 5/11] RUN pip install -r requirements.txt                                                             0.0s
 => CACHED [ 6/11] RUN git clone https://github.com/facebookresearch/detectron2.git                                0.0s
 => CACHED [ 7/11] RUN python -m pip install -e detectron2                                                         0.0s
 => CACHED [ 8/11] RUN apt-get update && apt-get install libgl1 -y                                                 0.0s
 => CACHED [ 9/11] RUN pip install -U nltk                                                                         0.0s
 => [10/11] RUN [ "python3", "-c", "import nltk; nltk.download('punkt', download_dir='/usr/local/nltk_data')" ]   22.1s
 => [11/11] COPY . /app                                                                                            0.0s
 => exporting to image                                                                                             0.1s
 => => exporting layers                                                                                            0.1s
 => => writing image sha256:83e2495addbc4cdf9b0885e1bb4c5b0fb0777177956eda56950bbf59c095d23b                       0.0s
 => => naming to docker.io/library/my_app

但是我在尝试运行图像时不断收到以下错误:

>>> docker run -p 8080:8080 my_app
[nltk_data] Error loading punkt: <urlopen error EOF occurred in
[nltk_data]     violation of protocol (_ssl.c:1129)>
[nltk_data] Error loading punkt: <urlopen error EOF occurred in
[nltk_data]     violation of protocol (_ssl.c:1129)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     EOF occurred in violation of protocol (_ssl.c:1129)>
Traceback (most recent call last):
  File "/app/app.py", line 16, in <module>
    index = VectorstoreIndexCreator().from_loaders(loaders)
  File "/venv/lib/python3.9/site-packages/langchain/indexes/vectorstore.py", line 72, in from_loaders
    docs.extend(loader.load())
  File "/venv/lib/python3.9/site-packages/langchain/document_loaders/unstructured.py", line 70, in load
    elements = self._get_elements()
  File "/venv/lib/python3.9/site-packages/langchain/document_loaders/pdf.py", line 37, in _get_elements
    return partition_pdf(filename=self.file_path, **self.unstructured_kwargs)
  File "/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py", line 75, in partition_pdf
    return partition_pdf_or_image(
  File "/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py", line 137, in partition_pdf_or_image
    return _partition_pdf_with_pdfminer(
  File "/venv/lib/python3.9/site-packages/unstructured/utils.py", line 43, in wrapper
    return func(*args, **kwargs)
  File "/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py", line 248, in _partition_pdf_with_pdfminer
    elements = _process_pdfminer_pages(
  File "/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py", line 293, in _process_pdfminer_pages
    _elements = partition_text(text=text)
  File "/venv/lib/python3.9/site-packages/unstructured/partition/text.py", line 89, in partition_text
    elif is_possible_narrative_text(ctext):
  File "/venv/lib/python3.9/site-packages/unstructured/partition/text_type.py", line 76, in is_possible_narrative_text
    if exceeds_cap_ratio(text, threshold=cap_threshold):
  File "/venv/lib/python3.9/site-packages/unstructured/partition/text_type.py", line 273, in exceeds_cap_ratio
    if sentence_count(text, 3) > 1:
  File "/venv/lib/python3.9/site-packages/unstructured/partition/text_type.py", line 222, in sentence_count
    sentences = sent_tokenize(text)
  File "/venv/lib/python3.9/site-packages/unstructured/nlp/tokenize.py", line 38, in sent_tokenize
    return _sent_tokenize(text)
  File "/venv/lib/python3.9/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
    tokenizer = load(f"tokenizers/punkt/{language}.pickle")
  File "/venv/lib/python3.9/site-packages/nltk/data.py", line 750, in load
    opened_resource = _open(resource_url)
  File "/venv/lib/python3.9/site-packages/nltk/data.py", line 876, in _open
    return find(path_, path + [""]).open()
  File "/venv/lib/python3.9/site-packages/nltk/data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')

  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/PY3/english.pickle

  Searched in:
    - '/root/nltk_data'
    - '/venv/nltk_data'
    - '/venv/share/nltk_data'
    - '/venv/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************
python linux docker ssl-certificate nltk
© www.soinside.com 2019 - 2024. All rights reserved.