我尝试使用 pyarrow 但没有成功。我的代码:
df = pd.read_csv("file.csv", engine='pyarrow')
我收到此错误:
"pyarrow.lib.ArrowInvalid: straddling object straddles two block boundaries (try to increase block size?)"
我找不到任何参数来更改块大小。有什么建议吗?
要设置
block_size
,您需要直接使用 PyArrow:
from pyarrow import csv
# read CSV using PyArrow with ReadOptions
read_options = csv.ReadOptions(
block_size=1024, # <= define a block size here
)
table = csv.read_csv("file.csv", read_options=read_options)
# convert PyArrow Table to pandas DataFrame
df = table.to_pandas()