使用 Terraform,我在 AWS (EKS) 中部署了 Kubernetes 集群,一切都很顺利。每当我尝试更改节点组或创建新节点组时,就会出现问题。
我使用了以下代码:
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "19.15.3"
cluster_name = var.cluster_name
cluster_version = "1.27"
# EKS Addons
cluster_addons = {
coredns = {
most_recent = true # To ensure access to the latest settings provided
}
kube-proxy = {
most_recent = true # To ensure access to the latest settings provided
}
vpc-cni = {
# Specify the VPC CNI addon should be deployed before compute to ensure
# the addon is configured before data plane compute resources are created
before_compute = true
most_recent = true # To ensure access to the latest settings provided
configuration_values = jsonencode({
})
}
}
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
cluster_endpoint_public_access = true
# Calico needs VXLAN communication between nodes
node_security_group_additional_rules = {
ingress_self_all = {
description = "Node to node all ports/protocols"
protocol = "-1"
from_port = 0
to_port = 0
type = "ingress"
self = true
}
ingress_cluster_to_node_all = {
description = "API Server to nodes all ports/protocols"
protocol = "-1"
from_port = 0
to_port = 0
type = "ingress"
source_cluster_security_group = true
}
egress_all = {
description = "Node all egress"
protocol = "-1"
from_port = 0
to_port = 0
type = "egress"
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}
}
eks_managed_node_group_defaults = {
ami_type = "AL2_x86_64"
}
eks_managed_node_groups = {
default_nodes_groups = {
name = "node-group-1"
instance_types = ["t3.small"]
min_size = 3
max_size = 3
desired_size = 3
}
}
manage_aws_auth_configmap = true
aws_auth_roles = [
{
rolearn = module.eks_admins_iam_role.iam_role_arn
username = module.eks_admins_iam_role.iam_role_name
groups = ["system:masters"]
},
]
}
VPC 非常基本,对于与 RBAC 系统关联的 IAM 角色:masters 我遵循了这个非常详细的参考:参考链接
基本上,我:
aws_auth_roles
部分中放置的角色)然后,使用这些 Terraform 可以在创建/编辑期间与集群交互:
data "aws_eks_cluster_auth" "default" {
name = var.cluster_name
depends_on = [ module.eks.eks_managed_node_groups, ]
}
provider "kubernetes" {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
token = data.aws_eks_cluster_auth.default.token
}
provider "helm" {
kubernetes {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
token = data.aws_eks_cluster_auth.default.token
}
}
最后我添加了 calico 和 AWS 负载均衡器控制器
# Calico addon to exploit Network Security Policies in EKS
module "kubernetes_addons_calico" {
count = var.calico_network_policies_enabled ? 1 : 0
source = "github.com/aws-ia/terraform-aws-eks-blueprints//modules/kubernetes-addons?ref=v4.32.1"
eks_cluster_id = module.eks.cluster_name
enable_calico = true
calico_helm_config = {
name = "calico" # (Required) Release name.
repository = "https://docs.projectcalico.org/charts" # (Optional) Repository URL where to locate the requested chart.
chart = "tigera-operator" # (Required) Chart name to be installed.
version = "v3.26.1" # (Optional) Specify the exact chart version to install.
namespace = "tigera-operator" # (Optional) The namespace to install the release into.
values = [
<<-EOT
installation:
kubernetesProvider: EKS
EOT
]
}
}
module "lb_role" {
source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
version = "5.28.0"
role_name = "${var.cluster_name}-load-balancer-controller"
attach_load_balancer_controller_policy = true
oidc_providers = {
main = {
provider_arn = module.eks.oidc_provider_arn
namespace_service_accounts = ["kube-system:aws-load-balancer-controller"]
}
}
}
# Deploy the AWS Load Balancer Controller
# it creates also the service-role
resource "helm_release" "lb_controller" {
name = "aws-load-balancer-controller"
chart = "aws-load-balancer-controller"
repository = "https://aws.github.io/eks-charts"
version = "1.5.5"
namespace = "kube-system"
set {
name = "clusterName"
value = var.cluster_name
}
set {
name = "rbac.create"
value = "true"
}
set {
name = "serviceAccount.create"
value = "true"
}
set {
name = "serviceAccount.name"
value = "aws-load-balancer-controller"
}
set {
name = "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
value = module.lb_role.iam_role_arn
}
}
一切工作正常,我设法在集群上完成所有事情。尽管如此,如果我尝试将节点添加到节点组,或者创建新的节点组,我会收到以下错误:
│ Error: query: failed to query with labels: secrets is forbidden: User "system:anonymous" cannot list resource "secrets" in API group "" in the namespace "tigera-operator"
│
│ with module.k8s_control_plane.module.kubernetes_addons_calico[0].module.calico[0].module.helm_addon.helm_release.addon[0],
│ on .terraform/modules/k8s_control_plane.kubernetes_addons_calico/modules/kubernetes-addons/helm-addon/main.tf line 1, in resource "helm_release" "addon":
│ 1: resource "helm_release" "addon" {
│
╵
╷
│ Error: configmaps "aws-auth" is forbidden: User "system:anonymous" cannot get resource "configmaps" in API group "" in the namespace "kube-system"
│
│ with module.k8s_control_plane.module.eks.kubernetes_config_map_v1_data.aws_auth[0],
│ on .terraform/modules/k8s_control_plane.eks/main.tf line 553, in resource "kubernetes_config_map_v1_data" "aws_auth":
│ 553: resource "kubernetes_config_map_v1_data" "aws_auth" {
│
╵
╷
│ Warning: Argument is deprecated
│
│ with module.k8s_control_plane.module.eks.aws_eks_addon.before_compute["vpc-cni"],
│ on .terraform/modules/k8s_control_plane.eks/main.tf line 420, in resource "aws_eks_addon" "before_compute":
│ 420: resolve_conflicts = try(each.value.resolve_conflicts, "OVERWRITE")
│
│ The "resolve_conflicts" attribute can't be set to "PRESERVE" on initial
│ resource creation. Use "resolve_conflicts_on_create" and/or
│ "resolve_conflicts_on_update" instead
╵
╷
│ Error: query: failed to query with labels: secrets is forbidden: User "system:anonymous" cannot list resource "secrets" in API group "" in the namespace "kube-system"
│
│ with module.k8s_control_plane.helm_release.lb_controller,
│ on ../../../modules/eks-ctrl-plane/aws-lb.tf line 26, in resource "helm_release" "lb_controller":
│ 26: resource "helm_release" "lb_controller" {
│
╵
╷
│ Warning: Argument is deprecated
│
│ with module.k8s_control_plane.module.eks.aws_eks_addon.this["kube-proxy"],
│ on .terraform/modules/k8s_control_plane.eks/main.tf line 392, in resource "aws_eks_addon" "this":
│ 392: resolve_conflicts = try(each.value.resolve_conflicts, "OVERWRITE")
│
│ The "resolve_conflicts" attribute can't be set to "PRESERVE" on initial
│ resource creation. Use "resolve_conflicts_on_create" and/or
│ "resolve_conflicts_on_update" instead
╵
╷
│ Warning: Argument is deprecated
│
│ with module.k8s_control_plane.module.eks.aws_eks_addon.this["coredns"],
│ on .terraform/modules/k8s_control_plane.eks/main.tf line 392, in resource "aws_eks_addon" "this":
│ 392: resolve_conflicts = try(each.value.resolve_conflicts, "OVERWRITE")
│
│ The "resolve_conflicts" attribute can't be set to "PRESERVE" on initial
│ resource creation. Use "resolve_conflicts_on_create" and/or
│ "resolve_conflicts_on_update" instead
╵
Operation failed: failed running terraform plan (exit 1)
请注意,警告出现在之前成功的运行中,我很确定它们是无害的(我放置它们是为了完整性)。
我尝试做的主要两个改变(不是一起)是:
default_nodes_groups = {
name = "node-group-1"
instance_types = ["t3.small"]
min_size = 3
max_size = 5
desired_size = 5
}
和:
eks_managed_node_groups = {
default_nodes_groups = {
name = "node-group-1"
instance_types = ["t3.small"]
min_size = 3
max_size = 3
desired_size = 3
}
extra_nodes_groups = {
name = "node-group-2"
instance_types = ["t3.small"]
min_size = 1
max_size = 2
desired_size = 1
}
}
通过这些更改,一切都会以同样的方式失败。
我几乎不知道可能是什么问题,我尝试更新,我来回尝试理解为什么这些更改是由 system:anonymous 用户执行的,但我真的不明白。另外,为什么该更改需要查询集群中的机密,以及为什么这些查询是以某种方式执行的,而不是由拥有所有权限的 Kubernetes 提供者执行!
我认为 Kubernetes 操作是通过具有 system:masters RBAC 权限的 Kubernetes 提供程序执行的(事实上我可以更新所有插件/资源,创建/销毁它们等),但似乎节点组的更改是使用其他东西。我尝试检查 Terraform 用户的权限,但它已经具有完全管理策略,因此如果通过其 IAM 权限执行该更改,它基本上可以在 EKS 上执行任何操作。
我唯一关心的是,如果以某种方式,仅在编辑期间(不是创建......这些查询很奇怪),EKS 模块使用 Terraform 用户(而不是具有 RBAC system:mastes 权限的假设角色的 Kubernetes 提供者) )与 Kubernetes API 交互并编辑集群(这可以解释 system:anonymous)...但在我看来,这可能是问题所在,我的意思是为什么相比之下更改集群会有所不同首先创建它?
有谁知道可能出现的问题以及如何解决此问题以便能够从 Terraform 手动放大和缩小节点?