我正在使用 Terraform 创建带有 EC2 实例的 ECS 集群。我的目标是让一项任务仅在一个 EC2 实例上运行。我正在管理该集群的容量提供程序和自动扩展。
最初,将任务部署到 EC2 实例运行顺利。但是,当我尝试部署新的任务定义来替换现有任务时,ECS 使该任务保持在 PROVISIONING 状态。该任务将保持此状态,直到我将自动缩放组的 max_size 从 1 更改为 2。 完成此操作后,新任务部署将在新的 EC2 实例中完成,并且前一个实例会在一段时间后作为与 CloudWatch 警报相关的操作被删除 (CapacityProviderReservation < 100 for 15 datapoints within 15 minutes).
目前,在我的非生产环境中,我只想保留一个实例作为集群的一部分,并允许在同一实例上多次部署同一任务。
Terraform 代码:
# ECS service
resource "aws_ecs_service" "this" {
name = "cluster"
iam_role = aws_iam_role.ecs_role.arn
cluster = aws_ecs_cluster.cluster.id
task_definition = aws_ecs_task_definition.task_definition.arn
desired_count = 1
force_new_deployment = true
load_balancer {
target_group_arn = aws_alb_target_group.lb.arn
container_name = aws_ecs_task_definition.task_definition.family
container_port = 80
}
ordered_placement_strategy {
type = "binpack"
field = "memory"
}
capacity_provider_strategy {
capacity_provider = aws_ecs_capacity_provider.ecs_capacity_provider.name
base = 1
weight = 100
}
lifecycle {
create_before_destroy = true
ignore_changes = [
desired_count
]
}
}
# Auto scaling
resource "aws_autoscaling_group" "ecs_asg" {
name = "asg"
vpc_zone_identifier = [for subnet in var.public_subnet_ids : subnet]
max_size = 1
min_size = 1
desired_capacity = 1
health_check_type = "EC2"
protect_from_scale_in = false
launch_template {
id = aws_launch_template.template.id
version = "$Latest"
}
instance_refresh {
strategy = "Rolling"
}
lifecycle {
create_before_destroy = true
}
}
# capacity provider
resource "aws_ecs_capacity_provider" "ecs_capacity_provider" {
name = "ecs_capacity_provider"
auto_scaling_group_provider {
auto_scaling_group_arn = aws_autoscaling_group.ecs_asg.arn
managed_termination_protection = "DISABLED"
managed_scaling {
maximum_scaling_step_size = 2
minimum_scaling_step_size = 1
status = "ENABLED"
target_capacity = 100
}
}
}
resource "aws_ecs_cluster_capacity_providers" "ecs_capacity_providers" {
cluster_name = aws_ecs_cluster.cluster.name
capacity_providers = [aws_ecs_capacity_provider.ecs_capacity_provider.name]
default_capacity_provider_strategy {
base = 1
weight = 100
capacity_provider = aws_ecs_capacity_provider.ecs_capacity_provider.name
}
}
# ECS Task
resource "aws_ecs_task_definition" "task" {
family = "task"
container_definitions = jsonencode([
{
name = "task",
image = "test",
cpu = "768",
memory = "4096",
essential = true
portMappings = [
{
containerPort = 80
hostPort = 80
protocol = "tcp"
}
]
logConfiguration = {
logDriver = "awslogs",
options = {
"awslogs-group" = aws_cloudwatch_log_group.logs.name,
"awslogs-region" = var.region,
"awslogs-stream-prefix" = "app"
}
}
}
])
execution_role_arn = aws_iam_role.ecs_exec.arn
task_role_arn = aws_iam_role.ecs_task.arn
}
我注意到当我尝试与自动缩放链接的第二个任务部署时,会触发相关的云监视警报:
警报:“TargetTracking-test-ecs-asg-AlarmHigh-e5a4556-5686-5669-26546-e745a5ed90cb”
如果我将 ECS 服务的部署最大百分比 (deployment_maximum_percent) 设置为 200,ECS 将尝试在同一 EC2 实例上创建新任务,但由于容器端口冲突而失败。此行为是预期的,因为默认情况下 ECS 会尝试启动新服务任务并确保其正常运行,然后再终止旧任务。
对于我的非生产环境,在新部署期间出现一些停机时间是可以接受的,并且我希望集群中只有一个实例并允许在同一实例上多次部署同一任务,我可以将 MaximumPercent 设置为 100 并minimumHealthyPercent (deployment_minimum_healthy_percent) 为 0。使用此配置,ECS 将首先删除正在运行的任务,然后在同一实例上启动新任务。