如何将pyarrow木地板数据写入s3桶？

Question

我创建了一个数据框，并使用pyarrow将df转换为实木复合地板文件（也提到了here：]

def convert_df_to_parquet(self,df):
    table = pa.Table.from_pandas(df)
    buf = pa.BufferOutputStream()
    pq.write_table(table, buf)
    return buf

现在我想将上传的内容保存到s3存储桶，并为upload_file()尝试了不同的输入参数，但我尝试的所有方法都不起作用：

s3_client.upload_file(parquet_file, bucket_name, destination_key)#1st
s3_client.put_object(Bucket=bucket_name, Key=destination_key, Body=parquet_file)#2nd
s3_client.put_object(Bucket=bucket_name, Key=destination_key, Body=parquet_file.getvalue())#3rd
s3_client.put_object(Bucket=bucket_name, Key=destination_key, Body=parquet_file.read1())#4th

错误：

 s3_client.put_object(Bucket=bucket_name, Key=destination_key, Body=parquet_file.read1())
  File "pyarrow/io.pxi", line 376, in pyarrow.lib.NativeFile.read1
  File "pyarrow/io.pxi", line 310, in pyarrow.lib.NativeFile.read
  File "pyarrow/io.pxi", line 320, in pyarrow.lib.NativeFile.read
  File "pyarrow/io.pxi", line 155, in pyarrow.lib.NativeFile.get_input_stream
  File "pyarrow/io.pxi", line 170, in pyarrow.lib.NativeFile._assert_readable
OSError: only valid on readonly files

Answer 1

来自doc

您应该做与此类似的事情，

import boto3
s3 = boto3.resource('s3')
s3.meta.client.upload_file('/tmp/'+parquet_file, bucket_name, parquet_file)

如何将pyarrow木地板数据写入s3桶？

问题描述投票：0回答：1

1个回答

最新问题

如何将pyarrow木地板数据写入s3桶？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1