在将数据从天蓝色斑点加载到雪花时,我遇到以下错误:
ErrorCode=EncryptedExcelIsNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Encrypted excel file 'Reports.csv' is not supported, please remove its password.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=ICSharpCode.SharpZipLib.Zip.ZipException,Message=Wrong Local header signature: 0x6167724F,Source=ICSharpCode.SharpZipLib,'
虽然文件未加密且不受密码保护。 您能帮忙解决这个问题吗?预先感谢。
ErrorCode=EncryptedExcelIsNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Encrypted excel file 'Reports.csv' is not supported, please remove its password.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=ICSharpCode.SharpZipLib.Zip.ZipException,Message=Wrong Local header signature: 0x6167724F,Source=ICSharpCode.SharpZipLib,'
该错误表明您的文件仍处于加密和保护状态。目前,ADF 不支持加密或受保护的 Excel 文件。因此,您可以按照以下解决方法进行操作:
通过以下代码在 Azure Databricks 中使用 Python 读取受密码保护的 Excel 文件:
import io
import msoffcrypto
import openpyxl
from azure.storage.blob import BlobServiceClient
from azure.storage.blob import BlobClient
connection_string = "<blobStorageConnnectionString>"
# Create a BlobServiceClient
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
# Get the blob client for your Excel file
blob_client = blob_service_client.get_blob_client(container="<containerName>", blob="<xlsxFilePath>")
# Download the blob contents into a stream
stream = io.BytesIO()
blob_client.download_blob().readinto(stream)
# Decrypt the workbook
decrypted_workbook = io.BytesIO()
office_file = msoffcrypto.OfficeFile(stream)
office_file.load_key(password='<password>')
office_file.decrypt(decrypted_workbook)
# Load the workbook using openpyxl
workbook = openpyxl.load_workbook(filename=decrypted_workbook)
for sheet_name in workbook.sheetnames:
sheet = workbook[sheet_name]
print(f"Sheet: {sheet_name}")
for row in sheet.iter_rows(values_only=True):
print(row)
print()
它将读取 Excel 工作表和文件,如下所示:
您可以使用以下代码将这些工作表转换为 CSV 格式并将其上传到 Blob 存储:
import pandas as pd
from azure.storage.blob import BlobClient
# Load the workbook using pandas
xls = pd.ExcelFile(decrypted_workbook)
# Iterate over each sheet and convert it to CSV
for sheet_name in xls.sheet_names:
df = pd.read_excel(xls, sheet_name)
csv_data = df.to_csv(index=False)
# Upload the CSV data to the same blob location with a different name
csv_blob_name = f"outputs/{sheet_name}.csv"
csv_blob_client = blob_service_client.get_blob_client(container="files", blob=csv_blob_name)
csv_blob_client.upload_blob(csv_data, overwrite=True)
print(f"CSV file for sheet '{sheet_name}' uploaded to Azure Blob Storage.")
print("All sheets converted to CSV and uploaded successfully.")
所有工作表将以 CSV 格式复制到 Blob 存储,如下所示:
使用这些 CSV 文件作为源并将数据复制到 Snowflake 中。欲了解更多信息,您可以参考这个MS问题。