是否可以将xlsxwriter生成的文件发送到Azure数据湖而无需写入本地磁盘?

问题描述 投票:0回答:1

出于安全考虑,我需要将文件移动到Azure Datalake存储,而无需在本地写入文件。这是使用xlsxwriter软件包创建的excel工作簿。这是我尝试返回ValueError: Seek only available in read mode

的内容
import pandas as pd
from azure.datalake.store import core, lib, multithread
import xlsxwriter as xl

# Dataframes have undergone manipulation not listed in this code and come from a DB connection
matrix = pd.DataFrame(Database_Query1)
raw = pd.DataFrame(Database_Query2)

# Name datalake path for workbook
dlpath = '/datalake/file/path/file_name.xlsx'

# List store name
store_name = 'store_name_here'

# Create auth token
token = lib.auth(tenant_id= 'tenant_id_here',
                 client_id= 'client_id_here',
                 client_secret= 'client_secret_here')

# Create management file system client object
adl = core.AzureDLFileSystem(token, store_name= store_name)

# Create workbook structure
writer = pd.ExcelWriter(adl.open(dlpath, 'wb'), engine= 'xlsxwriter')
matrix.to_excel(writer, sheet_name= 'Compliance')
raw.to_excel(writer, sheet_name= 'Raw Data')

writer.save()

有什么想法吗?预先感谢。

python azure-data-lake xlsxwriter
1个回答
1
投票

如果数据不是非常庞大,则可以考虑将字节保留在内存中,并将流转储回adl

from io import BytesIO

xlb = BytesIO()
# ... do what you need to do ... #

writer = pd.ExcelWriter(xlb, engine= 'xlsxwriter')
matrix.to_excel(writer, sheet_name= 'Compliance')
raw.to_excel(writer, sheet_name= 'Raw Data')
writer.save()

# Set the cursor of the stream back to the beginning
xlb.seek(0) 

with adl.open(dlpath, 'wb') as fl:
     # This part I'm not entirely sure - consult what your adl write methods are
     fl.write(xlb.read())
© www.soinside.com 2019 - 2024. All rights reserved.