出于安全考虑,我需要将文件移动到Azure Datalake存储,而无需在本地写入文件。这是使用xlsxwriter软件包创建的excel工作簿。这是我尝试返回ValueError: Seek only available in read mode
import pandas as pd
from azure.datalake.store import core, lib, multithread
import xlsxwriter as xl
# Dataframes have undergone manipulation not listed in this code and come from a DB connection
matrix = pd.DataFrame(Database_Query1)
raw = pd.DataFrame(Database_Query2)
# Name datalake path for workbook
dlpath = '/datalake/file/path/file_name.xlsx'
# List store name
store_name = 'store_name_here'
# Create auth token
token = lib.auth(tenant_id= 'tenant_id_here',
client_id= 'client_id_here',
client_secret= 'client_secret_here')
# Create management file system client object
adl = core.AzureDLFileSystem(token, store_name= store_name)
# Create workbook structure
writer = pd.ExcelWriter(adl.open(dlpath, 'wb'), engine= 'xlsxwriter')
matrix.to_excel(writer, sheet_name= 'Compliance')
raw.to_excel(writer, sheet_name= 'Raw Data')
writer.save()
有什么想法吗?预先感谢。
如果数据不是非常庞大,则可以考虑将字节保留在内存中,并将流转储回adl
:
from io import BytesIO
xlb = BytesIO()
# ... do what you need to do ... #
writer = pd.ExcelWriter(xlb, engine= 'xlsxwriter')
matrix.to_excel(writer, sheet_name= 'Compliance')
raw.to_excel(writer, sheet_name= 'Raw Data')
writer.save()
# Set the cursor of the stream back to the beginning
xlb.seek(0)
with adl.open(dlpath, 'wb') as fl:
# This part I'm not entirely sure - consult what your adl write methods are
fl.write(xlb.read())