rancher-webhook Pod 卡在 ContainerCreating

问题描述 投票:0回答:0

我有一个问题,在我们的 rancher 2.7.1 docker 单节点安装上,本地集群变得疯狂。我不知道从什么时候开始,当我无法向我们的集群添加更多节点时,我注意到了这一点。通过 Rancher UI 的每个命令都被拒绝,因为没有 webhook 端点,所以我什至无法恢复快照。 webhook pod 实际上并不存在,因为它没有像“本地集群”中的每个其他 pod 一样被创建并停留在 ContainerCreating

Rancher Pods

kubectl --kubeconfig /tmp/local.yaml describe pod rancher-webhook-576c6b955f-2xlw8 -n cattle-system

Name:             rancher-webhook-576c6b955f-2xlw8
Namespace:        cattle-system
Priority:         0
Service Account:  rancher-webhook
Node:             local-node/172.17.0.2
Start Time:       Thu, 16 Mar 2023 13:47:00 +0100
Labels:           app=rancher-webhook
                  pod-template-hash=576c6b955f
Annotations:      cattle.io/timestamp: 2023-03-16T10:38:25Z
Status:           Pending
IP:
IPs:              <none>
Controlled By:    ReplicaSet/rancher-webhook-576c6b955f
Containers:
  rancher-webhook:
    Container ID:
    Image:          rancher/rancher-webhook:v0.3.0
    Image ID:
    Ports:          9443/TCP, 8777/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      STAMP:
      ENABLE_CAPI:  true
      ENABLE_MCM:   true
      NAMESPACE:    cattle-system (v1:metadata.namespace)
    Mounts:
      /tmp/k8s-webhook-server/serving-certs from tls (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lzlsf (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  tls:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rancher-webhook-tls
    Optional:    false
  kube-api-access-lzlsf:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 cattle.io/os=linux:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                 From               Message
  ----     ------                  ----                ----               -------
  Normal   Scheduled               15m                 default-scheduler  Successfully assigned cattle-system/rancher-webhook-576c6b955f-2xlw8 to local-node
  Warning  FailedCreatePodSandBox  42s (x70 over 15m)  kubelet            Failed to create pod sandbox: rpc error: code = NotFound desc = failed to create containerd container: failed to create snapshot: missing parent "k8s.io/430/sha256:1021ef88c7974bfff89c5a0ec4fd3160daac6c48a075f74cff721f85dd104e68" bucket: not found

因为这是我们的生产系统,我不能只启动一个全新的集群。

有什么提示吗?

提前致谢 萨沙

我尝试通过 rancher 恢复快照,我已经将 kubernetes 更新到 rancher 中的最新可用版本,我已经删除了本地集群中的部署并重新添加了它们。

rancher
© www.soinside.com 2019 - 2024. All rights reserved.