pandas + multiprocessing:“ NotImplementedError:不支持DataFrames!”

问题描述 投票:0回答:1

[我的previous thread被标记为duplicate后,它向多处理管理器方向。我正在尝试使用多重处理来创建一个服务,该服务处理我的熊猫数据框以提供给Flask请求。到目前为止,这是我的代码:

df_manager.py

from multiprocessing.managers import BaseManager
import pandas as pd

def init_dataframe():
    return pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})

def get_df():
    return df

df = init_dataframe()
manager = BaseManager(('', 37844), b'password')
manager.register('get_df', get_df)
server = manager.get_server()
server.serve_forever()

data_handler.py

from multiprocessing.managers import BaseManager
import pandas as pd

def get_df():
    manager = BaseManager(('', 37844), b'password')
    manager.register('get_df')
    manager.connect()
    return manager.get_df()

def data():
    df = get_df()
    return df.to_dict()

if __name__ == '__main__':
    data()

[不幸的是,这在尝试调用manager.get_df()中的data_handler.py时引发异常。

Traceback (most recent call last):
  File "src/data_handler.py", line 15, in <module>
    data()
  File "src/data_handler.py", line 11, in data
    df = get_df()
  File "src/data_handler.py", line 8, in get_df
    return manager.get_df()
  File "/usr/lib/python3.7/multiprocessing/managers.py", line 724, in temp
    token, exp = self._create(typeid, *args, **kwds)
  File "/usr/lib/python3.7/multiprocessing/managers.py", line 609, in _create
    id, exposed = dispatch(conn, None, 'create', (typeid,)+args, kwds)
  File "/usr/lib/python3.7/multiprocessing/managers.py", line 82, in dispatch
    raise convert_to_error(kind, result)
multiprocessing.managers.RemoteError: 
---------------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/managers.py", line 201, in handle_request
    result = func(c, *args, **kwds)
  File "/usr/lib/python3.7/multiprocessing/managers.py", line 391, in create
    exposed = public_methods(obj)
  File "/usr/lib/python3.7/multiprocessing/managers.py", line 122, in public_methods
    return [name for name in all_methods(obj) if name[0] != '_']
  File "/usr/lib/python3.7/multiprocessing/managers.py", line 113, in all_methods
    func = getattr(obj, name)
  File "/home/admin/dev/pandas-multiprocessing/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 392, in _constructor_expanddim
    raise NotImplementedError("Not supported for DataFrames!")
NotImplementedError: Not supported for DataFrames!
---------------------------------------------------------------------------

向正确方向提供的任何帮助将不胜感激!

EDIT:这似乎是由DataFrames引起的,因为返回df.to_json()而不是只返回df中的df_manager.py似乎很好。仍在调查中...

EDIT2:我已经更新了代码以删除Flask依赖项,因为它似乎与它无关。

Git repo

python pandas flask python-multiprocessing multiprocessing-manager
1个回答
0
投票

此问题由exposing通过BaseManager使用的代理解决了相关方法。这可以在register中的data_handler.py调用中完成。

df_manager.py

from multiprocessing.managers import BaseManager
import pandas as pd

def init_dataframe():
    return pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})

def get_df():
    return df

df = init_dataframe()
manager = BaseManager(('', 37844), b'password')
manager.register('get_df', callable=get_df, exposed='get_df') # Adding `exposed` parameter was the key to solving the issue
server = manager.get_server()
server.serve_forever()

data_handler.py

from multiprocessing.managers import BaseManager
import pandas as pd

def get_df():
    manager = BaseManager(('', 37844), b'password')
    manager.register('get_df')
    manager.connect()
    return manager.get_df()

def data():
    df = get_df()
    return df

if __name__ == '__main__':
    print(data())
© www.soinside.com 2019 - 2024. All rights reserved.