我可以在不同的线程中运行 pd.df.to_csv 吗?

问题描述 投票:0回答:1

我有一个相当大的 pandas 数据框,我想根据条件选择一些行。

问题在于,保存为 CSV 的操作与程序的整体流程是分开的,并且会消耗相当多的时间。

是否可以分离线程,以便主线程前进到选定的行,同时未选定的行在另一个线程中保存为 csv?

比如...

# This is sudo code

import pandas as pd

df = pd.DataFrame({"col1":[x for x in range(10000)], "col2":[x**2 for x in range(0, 10000)]})

df_selected = df[df.apply(lambda x: x.col1%3==0, axis=1)] 
df_unselected = df[df.apply(lambda x: x.col1%3!=0, axis=1)] 


def Other_thread_save_to_csv(df:pd.DataFrame):
     # this function is the last function to use df_unselected .


Other_thread_save_to_csv(df_unselected )

all_other_hadlings(df_selected )

python pandas export-to-csv
1个回答
0
投票

尝试这样

import pandas as pd
import threading

df = pd.DataFrame({"col1":[x for x in range(10000)], "col2":[x**2 for x in range(0, 10000)]})
df_selected = df[df.apply(lambda x: x.col1 % 3 == 0, axis=1)]
df_unselected = df[df.apply(lambda x: x.col1 % 3 != 0, axis=1)]
def other_thread_save_to_csv(df_unselected):
    df_unselected.to_csv('unselected_data.csv', index=False)
save_csv_thread = threading.Thread(target=other_thread_save_to_csv, args=(df_unselected,))
save_csv_thread.start()
def all_other_handling(df_selected):
all_other_handling(df_selected)
© www.soinside.com 2019 - 2024. All rights reserved.