Terraform 和 Helm 无法访问 Kubernetes 集群

Question

我正在为我的论文使用 terraform 创建一个 GKE 集群。起初我能够创建它，但后来我添加了 Istio、prometheus 等。所以我销毁了集群并用所有这些重新创建它。我开始遇到同样的错误：Kubernetes 集群无法访问。我已经检查了凭据问题并添加了服务帐户，但它不起作用。

我认为这是 helm 的凭据问题，我用它来创建 istio 及其插件。我还认为这可能是 kubeconfig 文件的问题，我不知道如何解决。我设置了 KUBE_CONFIG_PATH，但没有帮助。

最后我决定再次尝试创建集群，但它仍然不起作用，并且出现错误：Kubernetes 集群无法访问。这里发生了什么？我认为这与凭证或 kubeconfig 有关，但此时我迷失了。

有人遇到过这个问题吗？你是怎么解决的？

Terraform 提供程序文件：

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "4.63.1"
    }

    kubernetes = {
      source = "hashicorp/kubernetes"
      version = "2.21.1"
    }

    helm = {
      source = "hashicorp/helm"
      version = "2.10.1"
    }

    kubectl = {
      source  = "gavinbunney/kubectl"
      version = ">= 1.7.0"
    }
  }
}

provider "google" {
  project = var.gcp_project_id
  region  = var.region
  credentials = file("${var.credentials_gcp}/service-account.json")
}

provider "kubernetes" {
  # config_path = "~/.kube/config"
  # config_context = "gke_kube-testing-384710_europe-west1_thesis-cluster"
  host                   = "https://${google_container_cluster.primary.endpoint}"
  token                  = data.google_client_config.main.access_token
  cluster_ca_certificate = base64decode(google_container_cluster.primary.master_auth.0.cluster_ca_certificate)
}


provider "helm" {
  kubernetes {
    config_path = "~/.kube/config"
  }
}

terraform kubernetes 文件：

resource "google_container_cluster" "primary" {
    name = var.name
    location =  var.zone
    remove_default_node_pool = true
    initial_node_count = 1 

    network = google_compute_network.main.self_link
    subnetwork = google_compute_subnetwork.private.self_link

    logging_service = "none"         # "logging.googleapis.com/kubernetes"
    monitoring_service = "none"      # "monotoring/googleapis.com/kubernetes"   
    networking_mode = "VPC_NATIVE"

    # Optional, for multi-zonal cluster
    node_locations = var.multi-zonal ? local.zones : []   # if multi-zonal == true then use the zones in locals, else use []
     

     addons_config {
        http_load_balancing {
          disabled = true
        }

        horizontal_pod_autoscaling {
          disabled = true
        }
     }

     vertical_pod_autoscaling {
       enabled = false
     }

     release_channel {
       channel = "REGULAR"
     }

     workload_identity_config {
       workload_pool = "${var.gcp_project_id}.svc.id.goog"
     }

     ip_allocation_policy {
       cluster_secondary_range_name = "k8s-pod-range"
       services_secondary_range_name = "k8s-service-range"
    depends_on = [ 
      # module.enable_google_apis,
      # module.gcloud
      ]
     }

     private_cluster_config {
       enable_private_nodes = true
       enable_private_endpoint = false
       master_ipv4_cidr_block = "172.16.0.0/28"
     }
}

# Get credentials for cluster
resource "null_resource" "gcloud-connection" {
  provisioner "local-exec" {
    command = "gcloud container clusters get-credentials ${var.name} --zone ${var.zone} --project ${var.gcp_project_id}"
  }

  depends_on = [ google_container_cluster.primary ]
}

# Apply YAML kubernetes-manifest configurations      
resource "null_resource" "apply_deployment" {
  provisioner "local-exec" {
    interpreter = ["bash", "-exc"]
    command     = "kubectl apply -k ${var.filepath_manifest} -n ${var.namespace}"
  }

  depends_on = [ 
    null_resource.gcloud-connection
  ]
}

resource "google_service_account" "kubernetes" {
    account_id = "kubernetes"
}

我是否可能错误地使用了服务帐户？

如果您需要任何其他代码或信息，请随时询问。

Answer 1

首先，我解决了创建集群本身的几个问题，现在我可以毫无问题地创建它。至于其他问题，我认为问题在于 Helm 提供程序需要 kubeconfig 文件来访问集群，但当我运行 terraform init 时集群尚未创建。

为了解决这个问题，我找到了一个解决方案，首先运行：

terraform apply -target=google_container_node_pool.general --auto-approve

这将创建集群。然后我就跑：

gcloud container clusters get-credentials <cluster-name> --zone <zone> --project <project-id>

这会将新集群的凭据添加到 kubeconfig 文件中。然后我就跑：

terraform apply --auto-approve

这将使用新的 kubeconfig 上下文中已创建的集群凭据再次配置 helm 提供程序，以便 helm 提供程序可以将图表安装到集群。

它现在有效，但我想知道是否有另一种方法可以做到这一点，而无需每次都执行整个过程，仅使用 terraform 而不是终端。

另外，我不明白为什么凭据命令在 terraform 中不起作用。我也在那里用，不过好像没什么关系。

Answer 2

此解决方案适用于 AWS，而非 GCP。但也许你也能得到同样的想法。 Kubernetes 令牌的寿命很短，只有 10-15 分钟。每次做某事都需要刷新令牌。

像这样使用 exec 文档，但稍作调整以包含角色 arn

data "aws_eks_cluster" "cluster" {
  name = module.eks.cluster_name
}

data "aws_eks_cluster_auth" "cluster" {
  name = module.eks.cluster_name
}
provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
 
  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
      args        = [
        "eks", "get-token", 
        "--cluster-name", data.aws_eks_cluster.cluster.name, 
        "--role-arn", var.iam_admin_role_arn #add this
      ]
    command     = "aws"
  }
}

Terraform 和 Helm 无法访问 Kubernetes 集群

问题描述投票：0回答：2

2个回答

最新问题

Terraform 和 Helm 无法访问 Kubernetes 集群

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2