TensorFlowModel 部署错误,未安装提供的requirements.txt 中的依赖项

问题描述 投票:0回答:2

我正在尝试部署 TensorFlowModel 并在 inference.py 文件中提供后处理...

我之前成功部署了模型并在笔记本中调用它,然后使用以下代码在 jupyter 笔记本中进行后处理:

model = TensorFlowModel(
    name=name_from_base('tf-yolov4'),
    model_data=model_artifact,
    role=role,
    framework_version='2.3'
)

现在我想通过提供 inference.py 文件来进行后处理,因此我按照此处的文档进行操作: https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/using_tf.html#sagemaker-tensorflow-docker-containers

并使用了这个片段:

from sagemaker.tensorflow import TensorFlowModel

model = TensorFlowModel(entry_point='inference.py',
                        dependencies=['requirements.txt'],
                        model_data='s3://mybucket/model.tar.gz',
                        role='MySageMakerRole')

我添加的依赖项:

numpy
tensorflow

我的问题是: 我打电话时的部署过程

predictor = model.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge')

未完成,当我检查云手表时,我发现以下内容:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/ggevent.py", line 162, in init_process
    super().init_process()
  File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 119, in init_process
    self.load_wsgi()
  File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 144, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/usr/local/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/usr/local/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 49, in load
    return self.load_wsgiapp()
  File "/usr/local/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 39, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/usr/local/lib/python3.7/site-packages/gunicorn/util.py", line 358, in import_app
    mod = importlib.import_module(module)
  File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/sagemaker/python_service.py", line 414, in <module>
    resources = ServiceResources()
  File "/sagemaker/python_service.py", line 400, in __init__
    self._python_service_resource = PythonServiceResource()
  File "/sagemaker/python_service.py", line 83, in __init__
    self._handler, self._input_handler, self._output_handler = self._import_handlers()
  File "/sagemaker/python_service.py", line 278, in _import_handlers
    spec.loader.exec_module(inference)
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/ml/model/code/inference.py", line 2, in <module>
    import numpy as np

ModuleNotFoundError: No module named 'numpy'

这让我相信容器使用了我的 inference.py,但没有使用我提供的requirements.txt 文件,因此没有名为“numpy”的模块!

我的问题: 我的代码做错了什么以及如何确保安装了运行 inference.py 的依赖项?

提前致谢!

amazon-web-services tensorflow amazon-sagemaker
2个回答
0
投票

答:SageMaker 要求将您的模型工件压缩在 .tar.gz 文件中。 SageMaker 会自动将此 .tar.gz 文件提取到容器中的 /opt/ml/model/ 目录中。如果您使用框架容器之一,例如 TensorFlow、PyTorch 或 MXNet,则该容器预计您的 TAR 结构如下: TensorFlow

model.tar.gz/
             |--[model_version_number]/|--variables|--saved_model.pb
            code/
                |--inference.py|--requirements.txt

0
投票

嗨,你是如何解决这个问题的?你能帮我吗?

© www.soinside.com 2019 - 2024. All rights reserved.