我有一个问题,在我们的 rancher 2.7.1 docker 单节点安装上,本地集群变得疯狂。我不知道从什么时候开始,当我无法向我们的集群添加更多节点时,我注意到了这一点。通过 Rancher UI 的每个命令都被拒绝,因为没有 webhook 端点,所以我什至无法恢复快照。 webhook pod 实际上并不存在,因为它没有像“本地集群”中的每个其他 pod 一样被创建并停留在 ContainerCreating
kubectl --kubeconfig /tmp/local.yaml describe pod rancher-webhook-576c6b955f-2xlw8 -n cattle-system
Name: rancher-webhook-576c6b955f-2xlw8
Namespace: cattle-system
Priority: 0
Service Account: rancher-webhook
Node: local-node/172.17.0.2
Start Time: Thu, 16 Mar 2023 13:47:00 +0100
Labels: app=rancher-webhook
pod-template-hash=576c6b955f
Annotations: cattle.io/timestamp: 2023-03-16T10:38:25Z
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/rancher-webhook-576c6b955f
Containers:
rancher-webhook:
Container ID:
Image: rancher/rancher-webhook:v0.3.0
Image ID:
Ports: 9443/TCP, 8777/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
STAMP:
ENABLE_CAPI: true
ENABLE_MCM: true
NAMESPACE: cattle-system (v1:metadata.namespace)
Mounts:
/tmp/k8s-webhook-server/serving-certs from tls (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lzlsf (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
tls:
Type: Secret (a volume populated by a Secret)
SecretName: rancher-webhook-tls
Optional: false
kube-api-access-lzlsf:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: cattle.io/os=linux:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 15m default-scheduler Successfully assigned cattle-system/rancher-webhook-576c6b955f-2xlw8 to local-node
Warning FailedCreatePodSandBox 42s (x70 over 15m) kubelet Failed to create pod sandbox: rpc error: code = NotFound desc = failed to create containerd container: failed to create snapshot: missing parent "k8s.io/430/sha256:1021ef88c7974bfff89c5a0ec4fd3160daac6c48a075f74cff721f85dd104e68" bucket: not found
因为这是我们的生产系统,我不能只启动一个全新的集群。
有什么提示吗?
提前致谢 萨沙
我尝试通过 rancher 恢复快照,我已经将 kubernetes 更新到 rancher 中的最新可用版本,我已经删除了本地集群中的部署并重新添加了它们。