我正在使用“数据资产”将数据从 azure datalake 读取到 azureML 工作区。
但是我想知道如何在azure datalake中写入数据。我有一个 pandas 数据框,想将其保存为 datalake 中的 csv/parquet。
代码:
import mltable
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
mlClient = MLClient.from_config(credential=DefaultAzureCredential())
dataAsset = mlClient.data.get(name="MyDataAsset", version="1")
pathTest = {
'folder': dataAsset.path
}
tblTest = mltable.from_parquet_files(paths=[pathTest])
dfBaseTest = tblTest.to_pandas_dataframe() # ok, here is my pandas dataframe
##############
ML operations.....result: dfResult
How to save dfResult in my dataLake. Is it possible to use the data asset: "MyDataAsset"? Or data asset is only read?
##############
上传数据的一种可能的解决方案是使用
Azure Data Lake Storage client library for Python
。
import os
from azure.storage.filedatalake import (
DataLakeServiceClient,
DataLakeDirectoryClient,
FileSystemClient
)
from azure.identity import DefaultAzureCredential
account_url = f"https://<Account-Name>.dfs.core.windows.net"
token_credential = DefaultAzureCredential()
service_client = DataLakeServiceClient(account_url, credential=token_credential)
file_system_client = service_client.create_file_system(file_system="dataasset2")
directory_client = file_system_client.create_directory("test")
file_client = directory_client.get_file_client("data.csv")
with open(file=os.path.join("", "data.csv"), mode="rb") as data:
file_client.upload_data(data, overwrite=True)