在 AzureML 中的 Azure DataLake 中写入文件

问题描述 投票:0回答:1

我正在使用“数据资产”将数据从 azure datalake 读取到 azureML 工作区。

但是我想知道如何在azure datalake中写入数据。我有一个 pandas 数据框,想将其保存为 datalake 中的 csv/parquet。

代码:

import mltable
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

mlClient = MLClient.from_config(credential=DefaultAzureCredential())
dataAsset = mlClient.data.get(name="MyDataAsset", version="1")

pathTest = {
    'folder': dataAsset.path
}

tblTest = mltable.from_parquet_files(paths=[pathTest])
dfBaseTest = tblTest.to_pandas_dataframe()  # ok, here is my pandas dataframe

##############
ML operations.....result: dfResult

How to save dfResult in my dataLake. Is it possible to use the data asset: "MyDataAsset"? Or data asset is only read?
##############
azure azure-data-lake azure-machine-learning-service
1个回答
0
投票

上传数据的一种可能的解决方案是使用

Azure Data Lake Storage client library for Python

import os
from azure.storage.filedatalake import (
    DataLakeServiceClient,
    DataLakeDirectoryClient,
    FileSystemClient
)
from azure.identity import DefaultAzureCredential
account_url = f"https://<Account-Name>.dfs.core.windows.net"
token_credential = DefaultAzureCredential()

service_client = DataLakeServiceClient(account_url, credential=token_credential)
file_system_client = service_client.create_file_system(file_system="dataasset2")
directory_client = file_system_client.create_directory("test")
file_client = directory_client.get_file_client("data.csv")

with open(file=os.path.join("", "data.csv"), mode="rb") as data:
    file_client.upload_data(data, overwrite=True)

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.