Python - Gunicorn 在 GPU 推理上崩溃

Question

我正在为 macOS 创建一个本地 API，用于处理 Python 中 ONNX 模型 (MiDaS) 的推理。我使用

onnxruntime-silicon

（

onnxruntime

的分支）在 Apple Silicon GPU 上运行模型，并使用

Flask

在服务器端运行模型。我设法让我的脚本与 Flask 开发服务器一起工作，但我无法让 Gunicorn 完成这项任务。

使用 Flask 开发服务器（工作）

这是一个有效的 python3 脚本（使用部署服务器）：

# Server libraries
from flask import Flask
# NN and Image processing libraries
import onnxruntime as ort
from cv2 import imread, imwrite, cvtColor, COLOR_BGR2RGB
import numpy as np

app = Flask(__name__)

# Use a GPU provider if available
providers = ort.get_available_providers()

# Load ONNX model
sess = ort.InferenceSession("models/model-f6b98070.onnx", providers=providers)

def postprocess(depth_map):

    '''Process and save the depth map as a JPG'''

    # Rescale to 0-255, convert to uint8 and save the image
    rescaled = (255.0 / depth_map[0].max() * (depth_map[0] - depth_map[0].min())).astype(np.uint8)
    rescaled = np.squeeze(rescaled)
    imwrite('tmp/depth.jpg', rescaled)

def preprocess(image='tmp/frame.jpg'):

    '''Load and process the image for the model'''

    input_image = imread(image) # Load image with OpenCV (384x384 only!)
    input_image = cvtColor(input_image, COLOR_BGR2RGB) # Convert to RGB
    input_array = np.transpose(input_image, (2,0,1)) # Reshape (H,W,C) to (C,H,W)
    input_array = np.expand_dims(input_array, 0) # Add the batch dimension B
    normalized_input_array = input_array.astype('float32') / 255 # Normalize
    return normalized_input_array

@app.route('/predict', methods=['POST'])
def predict():
    # Load input image
    input_array = preprocess()
    # Process inference
    input_name = sess.get_inputs()[0].name
    results = sess.run(None, {input_name: input_array})
    # Save depth map
    postprocess(results)
    return 'DONE'

if __name__ == '__main__':
    app.run(debug=True)

我可以这样提出请求：

import requests
response = requests.post('http://127.0.0.1:5000/predict')
print(response.status_code)

深度图保存为JPG，一切正常。

使用 Gunicorn 服务器（不工作）

现在，如果我想从默认的

Flask

服务器（为开发而设计）切换到

Gunicorn

（生产 WSGI 服务器）。从第一个脚本开始，我导入以下库：

from gunicorn.app.base import BaseApplication
import gunicorn.glogging
import gunicorn.workers.sync

我创建了一个 Gunicorn 类：

class GunicornApplication(BaseApplication):
    def __init__(self, app, options=None):
        self.application = app
        self.options = options or {}
        super().__init__()

    def load_config(self):
        for key, value in self.options.items():
            if key in self.cfg.settings and value is not None:
                self.cfg.set(key.lower(), value)

    def load(self):
        return self.application

并像这样初始化脚本：

if __name__ == '__main__':
    options = {'bind': '127.0.0.1:5000', 'workers': 1}
    GunicornApplication(app, options).run()

服务器启动没有问题，但当我发出推理请求时，Python 崩溃，并且脚本引发以下错误：

>>> [ERROR] Worker (pid:10517) was sent SIGSEGV!

我的请求得到以下例外：

>>> requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

我知道错误是由 GPU 引起的，因为如果提供程序设置在 CPU 上，则

Gunicorn

脚本可以正常工作。也许问题与脚本和 Gunicorn 服务器线程之间的通信不良有关？

欢迎任何帮助或建议！

Answer 1

据我了解，每个工作人员都需要将模型加载到自己的内存块中，因此我决定使用 Gunicorn

post_worker_init

参数为每个工作人员正确加载模型：

sess = None

def load_model(_):
    global sess

    # Your model loading code here
    providers = ort.get_available_providers()
    sess = ort.InferenceSession("/opt/FCPX Studio/Utils/depth map/models/model-f6b98070.onnx", providers=providers)

class GunicornApplication(BaseApplication):

    [...]

    def load_config(self):
        for key, value in self.options.items():
            if key in self.cfg.settings and value is not None:
                self.cfg.set(key.lower(), value)

        # Set up the post_worker_init hook to load the model.
        self.cfg.set('post_worker_init', load_model)

    [...]

这并没有解决问题，但让我在推理请求上从服务器收到以下错误：

objc[51435]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.

fork 安全性会阻止进程并导致 python 崩溃。这个答案以及这个线程很好地解释了这个问题。我仍然不确定到底是什么导致叉子安全阻止了该过程。

同时，我可以使用环境变量禁用分叉安全，如下所示：

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

Python - Gunicorn 在 GPU 推理上崩溃

问题描述投票：0回答：1

使用 Flask 开发服务器（工作）

使用 Gunicorn 服务器（不工作）

1个回答

最新问题

Python - Gunicorn 在 GPU 推理上崩溃

问题描述 投票：0回答：1

使用 Flask 开发服务器（工作）

使用 Gunicorn 服务器（不工作）

1个回答

最新问题

问题描述投票：0回答：1