我正在为 macOS 创建一个本地 API,用于处理 Python 中 ONNX 模型 (MiDaS) 的推理。我使用
onnxruntime-silicon
(onnxruntime
的分支)在 Apple Silicon GPU 上运行模型,并使用 Flask
在服务器端运行模型。我设法让我的脚本与 Flask 开发服务器一起工作,但我无法让 Gunicorn 完成这项任务。
这是一个有效的 python3 脚本(使用部署服务器):
# Server libraries
from flask import Flask
# NN and Image processing libraries
import onnxruntime as ort
from cv2 import imread, imwrite, cvtColor, COLOR_BGR2RGB
import numpy as np
app = Flask(__name__)
# Use a GPU provider if available
providers = ort.get_available_providers()
# Load ONNX model
sess = ort.InferenceSession("models/model-f6b98070.onnx", providers=providers)
def postprocess(depth_map):
'''Process and save the depth map as a JPG'''
# Rescale to 0-255, convert to uint8 and save the image
rescaled = (255.0 / depth_map[0].max() * (depth_map[0] - depth_map[0].min())).astype(np.uint8)
rescaled = np.squeeze(rescaled)
imwrite('tmp/depth.jpg', rescaled)
def preprocess(image='tmp/frame.jpg'):
'''Load and process the image for the model'''
input_image = imread(image) # Load image with OpenCV (384x384 only!)
input_image = cvtColor(input_image, COLOR_BGR2RGB) # Convert to RGB
input_array = np.transpose(input_image, (2,0,1)) # Reshape (H,W,C) to (C,H,W)
input_array = np.expand_dims(input_array, 0) # Add the batch dimension B
normalized_input_array = input_array.astype('float32') / 255 # Normalize
return normalized_input_array
@app.route('/predict', methods=['POST'])
def predict():
# Load input image
input_array = preprocess()
# Process inference
input_name = sess.get_inputs()[0].name
results = sess.run(None, {input_name: input_array})
# Save depth map
postprocess(results)
return 'DONE'
if __name__ == '__main__':
app.run(debug=True)
我可以这样提出请求:
import requests
response = requests.post('http://127.0.0.1:5000/predict')
print(response.status_code)
深度图保存为JPG,一切正常。
现在,如果我想从默认的
Flask
服务器(为开发而设计)切换到 Gunicorn
(生产 WSGI 服务器)。从第一个脚本开始,我导入以下库:
from gunicorn.app.base import BaseApplication
import gunicorn.glogging
import gunicorn.workers.sync
我创建了一个 Gunicorn 类:
class GunicornApplication(BaseApplication):
def __init__(self, app, options=None):
self.application = app
self.options = options or {}
super().__init__()
def load_config(self):
for key, value in self.options.items():
if key in self.cfg.settings and value is not None:
self.cfg.set(key.lower(), value)
def load(self):
return self.application
并像这样初始化脚本:
if __name__ == '__main__':
options = {'bind': '127.0.0.1:5000', 'workers': 1}
GunicornApplication(app, options).run()
服务器启动没有问题,但当我发出推理请求时,Python 崩溃,并且脚本引发以下错误:
>>> [ERROR] Worker (pid:10517) was sent SIGSEGV!
我的请求得到以下例外:
>>> requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
我知道错误是由 GPU 引起的,因为如果提供程序设置在 CPU 上,则
Gunicorn
脚本可以正常工作。也许问题与脚本和 Gunicorn 服务器线程之间的通信不良有关?
欢迎任何帮助或建议!
据我了解,每个工作人员都需要将模型加载到自己的内存块中,因此我决定使用 Gunicorn
post_worker_init
参数为每个工作人员正确加载模型:
sess = None
def load_model(_):
global sess
# Your model loading code here
providers = ort.get_available_providers()
sess = ort.InferenceSession("/opt/FCPX Studio/Utils/depth map/models/model-f6b98070.onnx", providers=providers)
class GunicornApplication(BaseApplication):
[...]
def load_config(self):
for key, value in self.options.items():
if key in self.cfg.settings and value is not None:
self.cfg.set(key.lower(), value)
# Set up the post_worker_init hook to load the model.
self.cfg.set('post_worker_init', load_model)
[...]
这并没有解决问题,但让我在推理请求上从服务器收到以下错误:
objc[51435]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
fork 安全性会阻止进程并导致 python 崩溃。 这个答案以及这个线程很好地解释了这个问题。我仍然不确定到底是什么导致叉子安全阻止了该过程。
同时,我可以使用环境变量禁用分叉安全,如下所示:
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES