如何更快地读取巨大的csv?

问题描述 投票:0回答:1

我尝试使用 pyarrow 但没有成功。我的代码:

df = pd.read_csv("file.csv", engine='pyarrow')

我收到此错误:

"pyarrow.lib.ArrowInvalid: straddling object straddles two block boundaries (try to increase block size?)"

我找不到任何参数来更改块大小。有什么建议吗?

python pandas csv io pyarrow
1个回答
0
投票

要设置

block_size
,您需要直接使用 PyArrow:

from pyarrow import csv

# read CSV using PyArrow with ReadOptions
read_options = csv.ReadOptions(
    block_size=1024,  # <= define a block size here
)

table = csv.read_csv("file.csv", read_options=read_options)

# convert PyArrow Table to pandas DataFrame
df = table.to_pandas()
© www.soinside.com 2019 - 2024. All rights reserved.