使用 ADF（azure 数据工厂）计算特定一列中的类别

Question

我的数据中有一个名为类别的列。类别是椅子和桌子。

我尝试过过滤、派生列，但派生列给了我额外的列，但我希望它在行级别。

Answer 1

由于列中不存在剩余值，Dataflow 无法给出这些值的计数。

因此，您可以按照以下方法来实现您的要求。在这种方法中，您需要一个包含所有类别的值的数组，例如

['chair','table','bowl','plat']

。

这是我在数据流源中获取的起始数据：

category
chair
table
chair
table
table
chair
chair
table

首先使用 aggregate 转换来获取类别和计数列。

在

Group by

部分中给出 category 列，并在

aggregate

部分中使用

count

表达式创建一个新列 count(category)，如下所示。

enter image description here

它将给出带有现有值的结果。

enter image description here

现在，在此之后进行另一个“聚合”转换。在此，将 group by 部分留空，并在 aggregate 部分中，使用 collect(category) 表达式为现有类别数组创建一列。

enter image description here 它将产生以下结果。

enter image description here 接下来，进行

派生列

转换，并使用上面的original数组列和所有类别的数组创建两列。

rem_items - toString(unfold(except(['chair','table','bowl','plat'],original)))

count2 - 0

enter image description here 这将首先生成一个不存在类别的数组，并将该数组展开为行。这些将被转换为字符串类型列，如下所示。对于

count2

列，无论来自此转换的值是什么，这些值的计数都将仅为

。

enter image description here 现在，使用

select

转换并删除多余的 original 列。

enter image description here 因此，选择转换将仅包含两列。

现在，在

aggregate1

转换之后创建一个 New 分支，并向其添加 Union 转换。使用上面的 select1 转换作为另一个输入，并按位置进行并集。

enter image description here

这将给出所需的结果。

enter image description here

将您的接收器添加到联合转换中，您可以运行数据流。