Theano 和 Celery:工作人员过早退出:信号 11 (SIGSEGV)

问题描述 投票:0回答:4

我正在构建一个 Web 应用程序,通过从客户端发送 ajax 请求,开始训练在服务器端使用 theano 实现的神经网络。显然,我不想等待服务器完全训练网络才能将答案发送回我的客户端,因为这太长了。

所以我想出了 celery,它使我能够在服务器端执行异步代码。我使用命令

celery -A CBIR worker -l info
运行芹菜工人。不幸的是,每次工作人员运行我的任务(使用 theano 训练我的网络)时,我都会收到以下消息:

[2015-12-14 19:15:06,790: ERROR/MainProcess] Process 'Worker-3' pid:1610 exited with 'signal 11 (SIGSEGV)'
[2015-12-14 19:15:07,001: ERROR/MainProcess] Task fit[ac40d4d4-5b56-4278-b270-647ef76f3a49] raised unexpected: WorkerLostError('Worker exited prematurely: signal 11 (SIGSEGV).',)
Traceback (most recent call last):
File "/Users/leo/anaconda/envs/ImgRet/lib/python3.5/site-packages/billiard/pool.py", line 1175, in mark_as_worker_losthuman_status(exitcode)),
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 11 (SIGSEGV).

我一直在寻找为什么会发生此错误,并且据我了解,我正在运行的代码正在遭受内存泄漏。我不明白的是为什么我的代码在不使用 celery 时运行没有问题,但在使用 celery 时出现此错误?

最重要的是我不知道如何解决这个问题。我使用 lldb 查看生成的转储文件,这是我的回溯:

thread #1: tid = 0x0000, 0x00007fff93b4a9b3 libdispatch.dylib`dispatch_group_async + 533, stop reason = signal SIGSTOP
* frame #0: 0x00007fff93b4a9b3 libdispatch.dylib`dispatch_group_async + 533
frame #1: 0x00007fff7c5b8d40 libdispatch.dylib`_dispatch_root_queues + 1280
frame #2: 0x00007fff9519b228 libBLAS.dylib`APL_dgemm + 1100
frame #3: 0x00007fff951d27aa libBLAS.dylib`cblas_dgemm + 1420
frame #4: 0x0000000104beeb18 multiarray.cpython-35m-darwin.so`gemm + 200
frame #5: 0x0000000104bee3b9 multiarray.cpython-35m-darwin.so`cblas_matrixproduct + 3097
frame #6: 0x0000000104bc01af multiarray.cpython-35m-darwin.so`PyArray_MatrixProduct2 + 207
frame #7: 0x0000000104bc4808 multiarray.cpython-35m-darwin.so`array_matrixproduct + 264
frame #8: 0x00000001000671a9 libpython3.5m.dylib`PyCFunction_Call + 281
frame #9: 0x00000001000f2fbd libpython3.5m.dylib`PyEval_EvalFrameEx + 32029
frame #10: 0x00000001000f4053 libpython3.5m.dylib`PyEval_EvalFrameEx + 36275
frame #11: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #12: 0x00000001000f4ef7 libpython3.5m.dylib`PyEval_EvalCodeEx + 71
frame #13: 0x0000000100041d2a libpython3.5m.dylib`function_call + 186
frame #14: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #15: 0x00000001000e95a7 libpython3.5m.dylib`PyEval_CallObjectWithKeywords + 87
frame #16: 0x00000001042fae3a lazylinker_ext.so`pycall(self=0x0000000108fad3d8, node_idx=13, verbose=0) + 442 at mod.cpp:510
frame #17: 0x00000001042fa869 lazylinker_ext.so`lazy_rec_eval(self=0x0000000108fad3d8, var_idx=24, one=0x000000010026cf60, zero=0x000000010026cf40) + 2089 at mod.cpp:704
frame #18: 0x00000001042fa789 lazylinker_ext.so`lazy_rec_eval(self=0x0000000108fad3d8, var_idx=28, one=0x000000010026cf60, zero=0x000000010026cf40) + 1865 at mod.cpp:690
frame #19: 0x00000001042fa16d lazylinker_ext.so`lazy_rec_eval(self=0x0000000108fad3d8, var_idx=30, one=0x000000010026cf60, zero=0x000000010026cf40) + 301 at mod.cpp:576
frame #20: 0x00000001042fa789 lazylinker_ext.so`lazy_rec_eval(self=0x0000000108fad3d8, var_idx=33, one=0x000000010026cf60, zero=0x000000010026cf40) + 1865 at mod.cpp:690
frame #21: 0x00000001042fa789 lazylinker_ext.so`lazy_rec_eval(self=0x0000000108fad3d8, var_idx=36, one=0x000000010026cf60, zero=0x000000010026cf40) + 1865 at mod.cpp:690
frame #22: 0x00000001042fa789 lazylinker_ext.so`lazy_rec_eval(self=0x0000000108fad3d8, var_idx=41, one=0x000000010026cf60, zero=0x000000010026cf40) + 1865 at mod.cpp:690
frame #23: 0x00000001042fa789 lazylinker_ext.so`lazy_rec_eval(self=0x0000000108fad3d8, var_idx=42, one=0x000000010026cf60, zero=0x000000010026cf40) + 1865 at mod.cpp:690
frame #24: 0x00000001042f83db lazylinker_ext.so`CLazyLinker_call(_self=0x0000000108fad3d8, args=0x0000000100382048, kwds=0x0000000000000000) + 811 at mod.cpp:838
frame #25: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #26: 0x00000001000ed08c libpython3.5m.dylib`PyEval_EvalFrameEx + 7660
frame #27: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #28: 0x00000001000f4ef7 libpython3.5m.dylib`PyEval_EvalCodeEx + 71
frame #29: 0x0000000100041d2a libpython3.5m.dylib`function_call + 186
frame #30: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #31: 0x000000010002a79c libpython3.5m.dylib`method_call + 140
frame #32: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #33: 0x0000000100080743 libpython3.5m.dylib`slot_tp_call + 67
frame #34: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #35: 0x00000001000ed08c libpython3.5m.dylib`PyEval_EvalFrameEx + 7660
frame #36: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #37: 0x00000001000f3d26 libpython3.5m.dylib`PyEval_EvalFrameEx + 35462
frame #38: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #39: 0x00000001000f4ef7 libpython3.5m.dylib`PyEval_EvalCodeEx + 71
frame #40: 0x0000000100041d2a libpython3.5m.dylib`function_call + 186
frame #41: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #42: 0x00000001000eff0b libpython3.5m.dylib`PyEval_EvalFrameEx + 19563
frame #43: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #44: 0x00000001000f4ef7 libpython3.5m.dylib`PyEval_EvalCodeEx + 71
frame #45: 0x0000000100041d2a libpython3.5m.dylib`function_call + 186
frame #46: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #47: 0x000000010002a79c libpython3.5m.dylib`method_call + 140
frame #48: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #49: 0x0000000100080743 libpython3.5m.dylib`slot_tp_call + 67
frame #50: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #51: 0x00000001000eff0b libpython3.5m.dylib`PyEval_EvalFrameEx + 19563
frame #52: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #53: 0x00000001000f3d26 libpython3.5m.dylib`PyEval_EvalFrameEx + 35462
frame #54: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #55: 0x00000001000f4ef7 libpython3.5m.dylib`PyEval_EvalCodeEx + 71
frame #56: 0x0000000100041d2a libpython3.5m.dylib`function_call + 186
frame #57: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #58: 0x00000001000eff0b libpython3.5m.dylib`PyEval_EvalFrameEx + 19563
frame #59: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #60: 0x00000001000f3d26 libpython3.5m.dylib`PyEval_EvalFrameEx + 35462
frame #61: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #62: 0x00000001000f3d26 libpython3.5m.dylib`PyEval_EvalFrameEx + 35462
frame #63: 0x00000001000f4053 libpython3.5m.dylib`PyEval_EvalFrameEx + 36275
frame #64: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #65: 0x00000001000f4ef7 libpython3.5m.dylib`PyEval_EvalCodeEx + 71
frame #66: 0x0000000100041d2a libpython3.5m.dylib`function_call + 186
frame #67: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #68: 0x000000010002a79c libpython3.5m.dylib`method_call + 140
frame #69: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #70: 0x0000000100080471 libpython3.5m.dylib`slot_tp_init + 81
frame #71: 0x000000010007b114 libpython3.5m.dylib`type_call + 212
frame #72: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #73: 0x00000001000ed08c libpython3.5m.dylib`PyEval_EvalFrameEx + 7660
frame #74: 0x00000001000f4053 libpython3.5m.dylib`PyEval_EvalFrameEx + 36275
frame #75: 0x00000001000f4053 libpython3.5m.dylib`PyEval_EvalFrameEx + 36275
frame #76: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #77: 0x00000001000f4ef7 libpython3.5m.dylib`PyEval_EvalCodeEx + 71
frame #78: 0x0000000100041d2a libpython3.5m.dylib`function_call + 186
frame #79: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #80: 0x00000001000eff0b libpython3.5m.dylib`PyEval_EvalFrameEx + 19563
frame #81: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #82: 0x00000001000f4ef7 libpython3.5m.dylib`PyEval_EvalCodeEx + 71
frame #83: 0x0000000100041d2a libpython3.5m.dylib`function_call + 186
frame #84: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #85: 0x000000010002a79c libpython3.5m.dylib`method_call + 140
frame #86: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #87: 0x0000000100080471 libpython3.5m.dylib`slot_tp_init + 81
frame #88: 0x000000010007b114 libpython3.5m.dylib`type_call + 212
frame #89: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #90: 0x00000001000eff0b libpython3.5m.dylib`PyEval_EvalFrameEx + 19563
frame #91: 0x00000001000f4053 libpython3.5m.dylib`PyEval_EvalFrameEx + 36275
frame #92: 0x00000001000f4053 libpython3.5m.dylib`PyEval_EvalFrameEx + 36275
frame #93: 0x00000001000f4053 libpython3.5m.dylib`PyEval_EvalFrameEx + 36275
frame #94: 0x00000001000f4053 libpython3.5m.dylib`PyEval_EvalFrameEx + 36275
frame #95: 0x00000001000f4053 libpython3.5m.dylib`PyEval_EvalFrameEx + 36275
frame #96: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #97: 0x00000001000f4ef7 libpython3.5m.dylib`PyEval_EvalCodeEx + 71
frame #98: 0x0000000100041d2a libpython3.5m.dylib`function_call + 186
frame #99: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #100: 0x00000001000eff0b libpython3.5m.dylib`PyEval_EvalFrameEx + 19563
frame #101: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #102: 0x00000001000f4ef7 libpython3.5m.dylib`PyEval_EvalCodeEx + 71
frame #103: 0x0000000100041d2a libpython3.5m.dylib`function_call + 186
frame #104: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #105: 0x000000010002a79c libpython3.5m.dylib`method_call + 140
frame #106: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #107: 0x0000000100080743 libpython3.5m.dylib`slot_tp_call + 67
frame #108: 0x000000010000d783 libpython3.5m.dylib`PyObject_Call + 99
frame #109: 0x00000001000eff0b libpython3.5m.dylib`PyEval_EvalFrameEx + 19563
frame #110: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #111: 0x00000001000f3d26 libpython3.5m.dylib`PyEval_EvalFrameEx + 35462
frame #112: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #113: 0x00000001000f3d26 libpython3.5m.dylib`PyEval_EvalFrameEx + 35462
frame #114: 0x00000001000f4053 libpython3.5m.dylib`PyEval_EvalFrameEx + 36275
frame #115: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #116: 0x00000001000f3d26 libpython3.5m.dylib`PyEval_EvalFrameEx + 35462
frame #117: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #118: 0x00000001000f3d26 libpython3.5m.dylib`PyEval_EvalFrameEx + 35462
frame #119: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #120: 0x00000001000f3d26 libpython3.5m.dylib`PyEval_EvalFrameEx + 35462
frame #121: 0x00000001000f4053 libpython3.5m.dylib`PyEval_EvalFrameEx + 36275
frame #122: 0x00000001000f4df0 libpython3.5m.dylib`_PyEval_EvalCodeWithName + 2400
frame #123: 0x00000001000f4f51 libpython3.5m.dylib`PyEval_EvalCode + 81
frame #124: 0x0000000100123d4e libpython3.5m.dylib`PyRun_FileExFlags + 206
frame #125: 0x0000000100123fef libpython3.5m.dylib`PyRun_SimpleFileExFlags + 447
frame #126: 0x000000010013c7d7 libpython3.5m.dylib`Py_Main + 3479
frame #127: 0x0000000100000e92 python3`main + 418
frame #128: 0x0000000100000cc4 python3`start + 52

我真的不知道如何解释这个回溯。预先感谢您的帮助!

segmentation-fault celery theano
4个回答
5
投票

如果有人遇到同样的问题,解决方法是在任务中内联导入 theano 库,而不是在模块级别。

这边:

import baz
import bar

@app.task
def foo():
    import theano

    # do something with theano

查看此处以获取更多说明


1
投票

FWIW,

sklearn.cluster.KMeans
也会发生这种情况。如果我使用
threading.Thread
自己创建线程,效果很好。如果我尝试在 Celery 工作线程下调用
fit
,则会收到 sig11。

我对

sklearn.linear_model.LogisticRegression
Ridge
LinearRegression
没有遇到同样的问题。


1
投票

我已经删除了在tasks.py文件顶部导入的所有包,除了从.celery导入应用程序

from <project>.celery import app
,然后在各个任务函数中导入包。它起作用了。


0
投票

我也有类似的问题。但是,错误 SIGSEGV 过去仅在每个工作线程、每个任务中发生一次。所以如果我有 3 项任务,以及 2 名工人(绿色和蓝色)。绿色上的两个任务中的每个第一个任务运行都会失败,蓝色上也是如此。 我试过:

  • 无需为
    WorkerLostError
  • 创建重试处理程序
  • 导入内部函数没有帮助

什么有效:

  • 使用选项
    -p
    文档链接
  • 启动 Celery Worker 时
  • 我推荐使用
    gevent
© www.soinside.com 2019 - 2024. All rights reserved.