如何为 Pandas 应用函数使用多处理池

问题描述 投票:0回答:1

我想为 Pandas 数据框使用池。 我尝试如下,但出现以下错误。 我不能在 Series 中使用 pool 吗?

from multiprocessing import pool

split = np.array_split(split,4)
pool = Pool(processes=4)
df = pd.concat(pool.map(split['Test'].apply(lambda x : test(x)), split))
pool.close()
pool.join()

报错信息如下

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers or slices, not str
python-3.x pandas python-multiprocessing
1个回答
0
投票

尝试:

import pandas as pd
import numpy as np
import multiprocessing as mp

def test(x):
    return x * 2

if __name__ == '__main__':
    # Demo dataframe
    df = pd.DataFrame({'Test': range(100)})

    # Extract the Series and split into chunk
    split = np.array_split(df['Test'], 4)

    # Parallel processing
    with mp.Pool(4) as pool:
        data = pool.map(test, split)

    # Concatenate results
    out = pd.concat(data)

输出:

>>> df
    Test
0      0
1      1
2      2
3      3
4      4
..   ...
95    95
96    96
97    97
98    98
99    99

[100 rows x 1 columns]

>>> out
0       0
1       2
2       4
3       6
4       8
     ... 
95    190
96    192
97    194
98    196
99    198
Name: Test, Length: 100, dtype: int64
© www.soinside.com 2019 - 2024. All rights reserved.