我有多个 .csv.gz 文件,我正在尝试将其读入 dask 数据帧,我能够使用此代码实现此目的:
file_paths = glob.glob(file_pattern)
@delayed
def read_csv(file_paths):
return dd.read_csv(file_paths, compression='gzip', blocksize=None,dtype=None)
dfs=[delayed(pd.read_csv)(fn) for fn in file_paths]
df = dd.from_delayed(dfs)
The problem is that when i tried converting the dask dataframe into pandas dataframe using
`df=df.compute()`
I get the error message:
"EmptyDataError: No columns to parse from file"
I would really appreciate any help with this
以下对我有用:
import os
import pandas as pd
import dask.dataframe as dd
file_path=r"C:\Users\John Doe\Downloads\checking gz"
dfs=[]
files=os.listdir(file_path)
for file in files:
if '.gz' in file:
df=dd.read_csv(file_path+'/'+file, compression='gzip',blocksize=None,error_bad_lines =False)
dfs.append(df)
print(df)
new_df=dd.concat(dfs)
pd_df=new_df.compute()