节点在尝试使用 `kubeadm` 加入集群时没有获得证书

问题描述 投票:0回答:2

我能够使用

kubeadm
为 kubernetes 部署引导主节点,但是我在
kubeadm join phase kubelet-start phase
:

kubeadm --v=5 join phase kubelet-start 192.168.1.198:6443 --token x4drpl.ie61lm4vrqyig5vg     --discovery-token-ca-cert-hash sha256:hjksdhjsakdhjsakdhajdka --node-name media-server         
W0118 23:53:28.414247   22327 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
I0118 23:53:28.414383   22327 initconfiguration.go:103] detected and using CRI socket: /var/run/dockershim.sock
I0118 23:53:28.414476   22327 join.go:441] [preflight] Discovering cluster-info
I0118 23:53:28.414744   22327 token.go:188] [discovery] Trying to connect to API Server "192.168.1.198:6443"
I0118 23:53:28.416434   22327 token.go:73] [discovery] Created cluster-info discovery client, requesting info from "https://192.168.1.198:6443"
I0118 23:53:28.433749   22327 token.go:134] [discovery] Requesting info from "https://192.168.1.198:6443" again to validate TLS against the pinned public key
I0118 23:53:28.446096   22327 token.go:152] [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.1.198:6443"
I0118 23:53:28.446130   22327 token.go:194] [discovery] Successfully established connection with API Server "192.168.1.198:6443"
I0118 23:53:28.446163   22327 discovery.go:51] [discovery] Using provided TLSBootstrapToken as authentication credentials for the join process
I0118 23:53:28.446186   22327 join.go:455] [preflight] Fetching init configuration
I0118 23:53:28.446197   22327 join.go:493] [preflight] Retrieving KubeConfig objects
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
I0118 23:53:28.461658   22327 interface.go:400] Looking for default routes with IPv4 addresses
I0118 23:53:28.461682   22327 interface.go:405] Default route transits interface "eno2"
I0118 23:53:28.462107   22327 interface.go:208] Interface eno2 is up
I0118 23:53:28.462180   22327 interface.go:256] Interface "eno2" has 2 addresses :[192.168.1.113/24 fe80::225:90ff:febe:5aaf/64].
I0118 23:53:28.462205   22327 interface.go:223] Checking addr  192.168.1.113/24.
I0118 23:53:28.462217   22327 interface.go:230] IP found 192.168.1.113
I0118 23:53:28.462228   22327 interface.go:262] Found valid IPv4 address 192.168.1.113 for interface "eno2".
I0118 23:53:28.462238   22327 interface.go:411] Found active IP 192.168.1.113 
I0118 23:53:28.462284   22327 kubelet.go:107] [kubelet-start] writing bootstrap kubelet config file at /etc/kubernetes/bootstrap-kubelet.conf
I0118 23:53:28.463384   22327 kubelet.go:115] [kubelet-start] writing CA certificate at /etc/kubernetes/pki/ca.crt
I0118 23:53:28.465766   22327 kubelet.go:133] [kubelet-start] Stopping the kubelet
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.17" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.Unfortunately, an error has occurred:
        timed out waiting for the conditionThis error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'
timed out waiting for the condition
error execution phase kubelet-start
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
        /workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:235
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
        /workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:422
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
        /workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).BindToCommand.func1.1
        /workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:348
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
        /workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:826
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
        /workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
        /workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864
k8s.io/kubernetes/cmd/kubeadm/app.Run
        /workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
        _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
        /usr/local/go/src/runtime/proc.go:203
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1357

现在,使用

journalctl -xeu kubelet
查看 kubelet 日志:

Jan 19 00:04:38 media-server systemd[23817]: kubelet.service: Executing: /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --cgroup-driver=cgroupfs
Jan 19 00:04:38 media-server kubelet[23817]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jan 19 00:04:38 media-server kubelet[23817]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jan 19 00:04:38 media-server kubelet[23817]: I0119 00:04:38.706834   23817 server.go:416] Version: v1.17.1
Jan 19 00:04:38 media-server kubelet[23817]: I0119 00:04:38.707261   23817 plugins.go:100] No cloud provider specified.
Jan 19 00:04:38 media-server kubelet[23817]: I0119 00:04:38.707304   23817 server.go:821] Client rotation is on, will bootstrap in background
Jan 19 00:04:38 media-server kubelet[23817]: E0119 00:04:38.709106   23817 bootstrap.go:240] unable to read existing bootstrap client config: invalid configuration: [unable to read client-cert /var/lib/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/kubelet/pki/kubelet-client-current.pem: no such file or directory, unable to read client-key /var/lib/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/kubelet/pki/kubelet-client-current.pem: no such file or directory]
Jan 19 00:04:38 media-server kubelet[23817]: F0119 00:04:38.709153   23817 server.go:273] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
Jan 19 00:04:38 media-server systemd[1]: kubelet.service: Child 23817 belongs to kubelet.service.
Jan 19 00:04:38 media-server systemd[1]: kubelet.service: Main process exited, code=exited, status=255/EXCEPTION

有趣的是,在尝试加入的工人身上没有找到

kubelet-client-current.pem
,实际上
/var/lib/kubelet/pki
里面唯一的文件是
kubelet.{crt,key}

如果我在尝试加入的节点上运行以下命令,我会发现所有证书都丢失了:

# kubeadm alpha certs check-expiration
W0119 00:06:35.088034   24017 validation.go:28] Cannot validate kube-proxy config - no validator is available
W0119 00:06:35.088082   24017 validation.go:28] Cannot validate kubelet config - no validator is available
CERTIFICATE                          EXPIRES   RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
!MISSING! admin.conf                                                                   
!MISSING! apiserver                                                                    
!MISSING! apiserver-etcd-client                                                        
!MISSING! apiserver-kubelet-client                                                     
!MISSING! controller-manager.conf                                                      
!MISSING! etcd-healthcheck-client                                                      
!MISSING! etcd-peer                                                                    
!MISSING! etcd-server                                                                  
!MISSING! front-proxy-client                                                           
!MISSING! scheduler.conf                                                               Error checking external CA condition for ca certificate authority: failure loading certificate for API server: failed to load certificate: couldn't load the certificate file /etc/kubernetes/pki/apiserver.crt: open /etc/kubernetes/pki/apiserver.crt: no such file or directory
To see the stack trace of this error execute with --v=5 or higher

/etc/kubernetes/pki
中唯一的文件是
ca.crt

master 和 worker 都有 kubeadm 和 kubelet 版本 1.17.1,所以版本不匹配的可能性不大

可能不相关但也容易导致错误的是工作节点和主节点都使用

Cgroup Driver: systemd
进行 docker 设置,但由于某种原因 kubelet 被传递
--cgroup-driver=cgroupfs

什么可能导致这个问题?更重要的是,我该如何修复才能成功将节点加入主节点?

编辑:更多信息

在 worker 上,systemd 文件是:

~# cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf 
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
#Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

kubelet
的单位服务:

~# cat /etc/systemd/system/multi-user.target.wants/kubelet.service 
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/home/

[Service]
ExecStart=/usr/bin/kubelet
Restart=always
StartLimitInterval=0
RestartSec=10

[Install]
WantedBy=multi-user.target

和 kubelet

config.yaml

~# cat /var/lib/kubelet/config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s
工作节点与主节点上

/var/lib/kubelet/kubeadm-flags.env

内容:

工人:

KUBELET_KUBEADM_ARGS="--cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1"

师傅:

KUBELET_KUBEADM_ARGS="--cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --resolv-conf=/run/systemd/resolve/resolv.conf"

master和worker的docker版本都是18.09,配置文件完全一样:

~$ cat /etc/docker/daemon.json
{
 "exec-opts": ["native.cgroupdriver=systemd"],
 "data-root": "/opt/var/docker/"
}
kubernetes kubeadm
2个回答
5
投票

我相信,由于引导令牌过期,工作节点上的 kubelet 服务无法向 API 服务器进行身份验证。你能在主节点上重新生成令牌并尝试在工作节点上运行 kubeadm join 命令吗?

CMD:  kubeadm token create --print-join-command

0
投票

如果有人遇到这个问题,我发现最终解决的问题始终是编辑每个控制平面节点上的 etcd.yaml 清单以强制重新创建 pod。我认为编辑不重要,只需要重新启动即可。

etcd pod 重启后,所有 CSR 都获得批准,节点加入。

© www.soinside.com 2019 - 2024. All rights reserved.