我们已经为我们运行的其他进程准备了一个 Azure 数据工厂,因此正在尝试使用它来解决以下问题:
我们已经拥有:
我们目前没有但最终解决方案可能需要的:
用例:
至于存储解决方案,因为它只是关于流程本身,并且每天或在任何其他流程中都不会重复使用任何内容,所以我无法判断最好使用什么(DB?Blob?, FS 本身?),但如果它不会造成任何问题,我会坚持使用 FS。
我刚刚开始使用 Azure 产品,因此整体上很混乱,但我会做到的。
(Parent pipeline)
- Get meta data 1 - child items - calculate your pipeline run interval timings and give the last modified- It selects only the files which are modified after given date.
- Get meta data activity2 - get all child items.
- foreach - give Get meta data activity1 child items.
- Filter activity - Filter out current filename item().name from Get meta 2 child items array.
- Execute pipeline activity(Child pipeline) - pass current filename and filter output array as parameters to child pipeline.
(Child pipeline)
- lookup activity1 - lookup to passed filename and get XML content as JSON array.
- Set variable activity - create a boolean variable Flag with value set to false.
- for-each - pass filter output array.
- lookup activity2 - for getting current filename content.
- set variable activity of int type - get length of intersection of lookup1 and lookup2 array with expression like @length(intersection(lookup1 array, lookup2 array))
- if activity - check this length greater than 1 or not
- True activities
- Set variable activity - Set the boolean variable Flag to true
- if activity - check the Flag boolean true or not.
- False activities
- copy activity - copy the passed filename to temp folder
- Copy activity - copy all files from temp folder to F2 folder as .zip file. Use Binary datsets for source and sink. Use wild card filepath in the source and `.zip(Deflate)` compression in the source dataset.
- Delete activity - Delete all files in the `temp` folder using wild card file path. Use Binary dataset for this.
- Delete activity - If you want to clean the old files in the F1 folder before next pipeline run, then delete all files in F1 folder as same as previous step.
在这里,您需要为 for 循环内使用的数据集或子管道中使用的数据集使用数据集参数。检查此SO答案
以了解ADF管道中数据集参数的用法。 如果大小超过给定的行数,那么最好使用其他服务,例如函数或逻辑应用程序。