如何展平 Azure 数据工厂中从任意容器接收到的数据？

Question

我正在尝试使用 Azure 数据工厂从 REST API 下载一些数据。我已成功将其作为 JSON 文件下载到 Azure 存储，但收到的数据是这样的：

{
    "@odata.context": "https://url/endpoint",
    "@odata.nextLink": "https://url/endpoint?$skiptoken=token",
    "value": [ { Page1Item1 }, { Page1Item2 } ]

{
    "@odata.context": "https://url/endpoint",
    "@odata.nextLink": "https://url/endpoint?$skiptoken=token",
    "value": [ { Page2Item1 }, { Page2Item2 } ]

这些多行在 JSON 文件中由换行符分隔。我想展开它们并创建一个 JSON 文件，其中包含这些页面中各个项目的列 - 例如如果项目数据结构是

{
    "name": "Name",
    "desc": "Description"
}

然后我希望输出文件是（项目的顺序不重要）：

{ "name": "Page1Item1Name", "desc": "Page1Item1Desc" }
{ "name": "Page1Item2Name", "desc": "Page1Item2Desc" }
{ "name": "Page2Item1Name", "desc": "Page2Item1Desc" }
{ "name": "Page2Item2Name", "desc": "Page2Item2Desc" }

我已经设法通过使用“展平”活动的数据流来实现此目的，并使用展开方式：

value

。但是，为了让数据流在管道中运行，我必须使用内联数据源来读取 JSON 文件，如果我尝试在源的

container

字段中使用动态表达式，它不会让我启动管道，给我一个错误，说：

"code":"BadRequest",
"message":null,
"target":"pipeline//runid/{run guid}",
"details":null,
"error":null

当我查看源的脚本定义时，如果我将其硬编码（有效），它会将容器显示为

'containername'

；如果我将其链接到参数，则显示为

($parametername)

- 请注意自动生成的括号 - 如果我将其链接到参数，则不会不工作。如果我创建数据集并使用数据集参数，也会发生同样的错误。

同一数据流的接收器数据集使用

@dataset().clientname

进行参数化，但该方法有效。通过更改数据流的各个部分，我确定问题出在源中，特别是容器名称字段中。

如何让管道指定数据流应从哪个容器读取数据？

这是数据流脚本：

parameters{
    clientname as string
}
source(output(
        {@odata.context} as string,
        {@odata.nextLink} as string,
        value as (businessPhones as string[], displayName as string, givenName as string, id as string, jobTitle as string, mail as string, mobilePhone as string, officeLocation as string, preferredLanguage as string, surname as string, userPrincipalName as string)[]
    ),
    useSchema: false,
    allowSchemaDrift: true,
    validateSchema: false,
    ignoreNoFilesFound: false,
    format: 'json',
    container: 'containername',
    folderPath: 'intune',
    fileName: 'users.json',
    documentForm: 'documentPerLine') ~> loadjsonusers
loadjsonusers foldDown(unroll(value),
    mapColumn(
        businessPhones = value.businessPhones,
        displayName = value.displayName,
        givenName = value.givenName,
        jobTitle = value.jobTitle,
        mail = value.mail,
        mobilePhone = value.mobilePhone,
        officeLocation = value.officeLocation,
        preferredLanguage = value.preferredLanguage,
        surname = value.surname,
        userPrincipalName = value.userPrincipalName,
        id = value.id
    ),
    skipDuplicateMapInputs: false,
    skipDuplicateMapOutputs: false) ~> flattenbyvalue
flattenbyvalue sink(allowSchemaDrift: true,
    validateSchema: false,
    partitionFileNames:['users-flattened.json'],
    skipDuplicateMapInputs: true,
    skipDuplicateMapOutputs: true,
    partitionBy('hash', 1)) ~> saveflattenedusers

如果我尝试参数化它，它会用以下内容替换容器行：

container: ($clientname),

Answer 1

您提供的数据流脚本是正确的。问题似乎在于您为数据流参数提供的值。您可以按照以下步骤为管道中的数据流参数指定值。

获取管道中的数据流活动并指定数据流名称

在数据流活动的参数选项卡中，通过单击数据流表达式指定数据流参数的值并指定文件名。

ddfd (1) .

如何展平 Azure 数据工厂中从任意容器接收到的数据？

问题描述投票：0回答：1

1个回答

最新问题

如何展平 Azure 数据工厂中从任意容器接收到的数据？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1