我一直在寻找有关如何为胶水工作流设置Cloudformation的示例,该工作流包括触发器,作业和搜寻器,但是我没有找到很多有关它的信息。
这是我可以从AWS上找到的唯一信息
{
"Type" : "AWS::Glue::Workflow",
"Properties" : {
"DefaultRunProperties" : Json,
"Description" : String,
"Name" : String,
"Tags" : Json
}
}
这里是一个工作流的示例,该工作流包含一个搜寻器并在搜寻器完成后要运行的作业。
通过使用WorkflowName标记触发器来定义。
我相信只有一个SCHEDULED或ON_DEMAND触发器可以启动工作流程。工作流中的所有其他触发器都必须在作业/搜寻器中为条件触发。这可能就是CloudFormation知道如何构建DAG的方式。
另请参阅如何在DefaultRunProperties中将工作流程参数定义为json。
---
AWSTemplateFormatVersion: '2010-09-09'
Parameters:
BaseBucket:
Description: Bucket used by my workflow jobs
Type: String
Resources:
MyWorkflow:
Type: AWS::Glue::Workflow
Properties:
DefaultRunProperties:
{
"workflowParameter1": "Foo",
"workflowParameter2": "Bar",
"bucket": { "Fn::Sub": "${BaseBucket}" }
}
Description: Workflow for orchestrating my jobs
Name: MyWorkflowName
WorkflowCrawler:
Type: AWS::Glue::Crawler
Properties:
Name: MyCrawler
Role: MyCrawlerRole
Description: A crawler to run as the first step in the workflow
DatabaseName: MyDatabase
Targets:
S3Targets:
- Path: !Sub "s3://${BaseBucket}/"
WorkflowJob:
Type: AWS::Glue::Job
Properties:
Description: Glue job to run after the crawler
Name: MyWorkflowJob
Role: MyJobRole
Command:
Name: pythonshell
PythonVersion: 3
ScriptLocation: !Sub "s3://${BaseBucket}/my_workflow_job_script.py"
WorkflowStartTrigger:
Type: AWS::Glue::Trigger
Properties:
Name: StartTrigger
Type: ON_DEMAND
Description: Trigger for starting the workflow
Actions:
- CrawlerName: !Ref WorkflowCrawler
WorkflowName: !Ref MyWorkflow
WorkflowJobTrigger:
Type: AWS::Glue::Trigger
Properties:
Name: CrawlerSuccessfulTrigger
Type: CONDITIONAL
StartOnCreation: True
Description: Trigger to start the glue job
Actions:
- JobName: !Ref WorkflowJob
Predicate:
Conditions:
- LogicalOperator: EQUALS
CrawlerName: !Ref WorkflowCrawler
CrawlState: SUCCEEDED
WorkflowName: !Ref MyWorkflow