通过 ADF 将文件从 blob 复制到雪花时出错:不支持加密的 Excel 文件“Reports.csv”,请删除其密码

问题描述 投票:0回答:1

在将数据从天蓝色斑点加载到雪花时,我遇到以下错误:

ErrorCode=EncryptedExcelIsNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Encrypted excel file 'Reports.csv' is not supported, please remove its password.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=ICSharpCode.SharpZipLib.Zip.ZipException,Message=Wrong Local header signature: 0x6167724F,Source=ICSharpCode.SharpZipLib,'

虽然文件未加密且不受密码保护。 您能帮忙解决这个问题吗?预先感谢。

snowflake-cloud-data-platform azure-data-factory copy-data
1个回答
0
投票
ErrorCode=EncryptedExcelIsNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Encrypted excel file 'Reports.csv' is not supported, please remove its password.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=ICSharpCode.SharpZipLib.Zip.ZipException,Message=Wrong Local header signature: 0x6167724F,Source=ICSharpCode.SharpZipLib,'

该错误表明您的文件仍处于加密和保护状态。目前,ADF 不支持加密或受保护的 Excel 文件。因此,您可以按照以下解决方法进行操作:

通过以下代码在 Azure Databricks 中使用 Python 读取受密码保护的 Excel 文件:

import io
import msoffcrypto
import openpyxl
from azure.storage.blob import BlobServiceClient
from azure.storage.blob import BlobClient

connection_string = "<blobStorageConnnectionString>"
# Create a BlobServiceClient
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
# Get the blob client for your Excel file
blob_client = blob_service_client.get_blob_client(container="<containerName>", blob="<xlsxFilePath>")
# Download the blob contents into a stream
stream = io.BytesIO()
blob_client.download_blob().readinto(stream)
# Decrypt the workbook
decrypted_workbook = io.BytesIO()
office_file = msoffcrypto.OfficeFile(stream)
office_file.load_key(password='<password>')
office_file.decrypt(decrypted_workbook)
# Load the workbook using openpyxl
workbook = openpyxl.load_workbook(filename=decrypted_workbook)
for sheet_name in workbook.sheetnames:
    sheet = workbook[sheet_name]
    print(f"Sheet: {sheet_name}")
    for row in sheet.iter_rows(values_only=True):
        print(row)
    print()

它将读取 Excel 工作表和文件,如下所示:

enter image description here

您可以使用以下代码将这些工作表转换为 CSV 格式并将其上传到 Blob 存储:

import pandas as pd
from azure.storage.blob import BlobClient

# Load the workbook using pandas
xls = pd.ExcelFile(decrypted_workbook)

# Iterate over each sheet and convert it to CSV
for sheet_name in xls.sheet_names:
    df = pd.read_excel(xls, sheet_name)
    csv_data = df.to_csv(index=False)
    
    # Upload the CSV data to the same blob location with a different name
    csv_blob_name = f"outputs/{sheet_name}.csv"
    csv_blob_client = blob_service_client.get_blob_client(container="files", blob=csv_blob_name)
    csv_blob_client.upload_blob(csv_data, overwrite=True)
    
    print(f"CSV file for sheet '{sheet_name}' uploaded to Azure Blob Storage.")

print("All sheets converted to CSV and uploaded successfully.")

所有工作表将以 CSV 格式复制到 Blob 存储,如下所示:

enter image description here

enter image description here

使用这些 CSV 文件作为源并将数据复制到 Snowflake 中。欲了解更多信息,您可以参考这个MS问题

© www.soinside.com 2019 - 2024. All rights reserved.