我正在尝试部署 TensorFlowModel 并在 inference.py 文件中提供后处理...
我之前成功部署了模型并在笔记本中调用它,然后使用以下代码在 jupyter 笔记本中进行后处理:
model = TensorFlowModel(
name=name_from_base('tf-yolov4'),
model_data=model_artifact,
role=role,
framework_version='2.3'
)
现在我想通过提供 inference.py 文件来进行后处理,因此我按照此处的文档进行操作: https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/using_tf.html#sagemaker-tensorflow-docker-containers
并使用了这个片段:
from sagemaker.tensorflow import TensorFlowModel
model = TensorFlowModel(entry_point='inference.py',
dependencies=['requirements.txt'],
model_data='s3://mybucket/model.tar.gz',
role='MySageMakerRole')
我添加的依赖项:
numpy
tensorflow
我的问题是: 我打电话时的部署过程
predictor = model.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge')
未完成,当我检查云手表时,我发现以下内容:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/ggevent.py", line 162, in init_process
super().init_process()
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 119, in init_process
self.load_wsgi()
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 144, in load_wsgi
self.wsgi = self.app.wsgi()
File "/usr/local/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/usr/local/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 49, in load
return self.load_wsgiapp()
File "/usr/local/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 39, in load_wsgiapp
return util.import_app(self.app_uri)
File "/usr/local/lib/python3.7/site-packages/gunicorn/util.py", line 358, in import_app
mod = importlib.import_module(module)
File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/sagemaker/python_service.py", line 414, in <module>
resources = ServiceResources()
File "/sagemaker/python_service.py", line 400, in __init__
self._python_service_resource = PythonServiceResource()
File "/sagemaker/python_service.py", line 83, in __init__
self._handler, self._input_handler, self._output_handler = self._import_handlers()
File "/sagemaker/python_service.py", line 278, in _import_handlers
spec.loader.exec_module(inference)
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/ml/model/code/inference.py", line 2, in <module>
import numpy as np
和
ModuleNotFoundError: No module named 'numpy'
这让我相信容器使用了我的 inference.py,但没有使用我提供的requirements.txt 文件,因此没有名为“numpy”的模块!
我的问题: 我的代码做错了什么以及如何确保安装了运行 inference.py 的依赖项?
提前致谢!
答:SageMaker 要求将您的模型工件压缩在 .tar.gz 文件中。 SageMaker 会自动将此 .tar.gz 文件提取到容器中的 /opt/ml/model/ 目录中。如果您使用框架容器之一,例如 TensorFlow、PyTorch 或 MXNet,则该容器预计您的 TAR 结构如下: TensorFlow
model.tar.gz/
|--[model_version_number]/|--variables|--saved_model.pb
code/
|--inference.py|--requirements.txt
嗨,你是如何解决这个问题的?你能帮我吗?