大家好,我即将结束全新集群的安装,但遇到了一个奇怪的问题。
我通过清单和 helm 图表部署 ingress-nginx,但它们给了我相同的结果
kubectl get po
nginx-ingress-dx6bg 0/1 Running 3 (26s ago) 3m44s 10.244.4.118 node-2 <none> <none>
nginx-ingress-gqkhz 0/1 Running 3 (29s ago) 3m47s 10.244.3.16 node-1 <none> <none>
nginx-ingress-dx6bg 0/1 Error 3 (86s ago) 4m44s 10.244.4.118 node-2 <none> <none>
nginx-ingress-gqkhz 0/1 Error 3 (89s ago) 4m47s 10.244.3.16 node-1 <none> <none>
nginx-ingress-dx6bg 0/1 CrashLoopBackOff 3 (12s ago) 4m56s 10.244.4.118 node-2 <none> <none>
nginx-ingress-gqkhz 0/1 CrashLoopBackOff 3 (13s ago) 4m59s 10.244.3.16 node-1 <none> <none>
nginx-ingress-gqkhz 0/1 Running 4 (44s ago) 5m30s 10.244.3.16 node-1 <none> <none>
nginx-ingress-dx6bg 0/1 Running 4 (51s ago) 5m35s 10.244.4.118 node-2 <none> <none>
nginx-ingress-b9fcfbb59-hwjc8 0/1 Running 6 (2m49s ago) 12m 10.244.4.116 node-2 <none> <none>
并描述 pod,问题出在活性探针中
kd po -n nginx-ingress nginx-ingress-b9fcfbb59-hwjc8
Name: nginx-ingress-b9fcfbb59-hwjc8
Namespace: nginx-ingress
Priority: 0
Service Account: nginx-ingress
Node: node-2/192.168.17.15
Start Time: Thu, 08 Feb 2024 17:09:37 +0100
Labels: app=nginx-ingress
app.kubernetes.io/name=nginx-ingress
app.kubernetes.io/version=3.4.2
app.nginx.org/version=1.25.3
pod-template-hash=b9fcfbb59
Annotations: <none>
Status: Running
SeccompProfile: RuntimeDefault
IP: 10.244.4.116
IPs:
IP: 10.244.4.116
Controlled By: ReplicaSet/nginx-ingress-b9fcfbb59
Containers:
nginx-ingress:
Container ID: containerd://57299408237d9d8b1b7be67ac12d6999640ff2249305c8d289a78a58fe6b38c9
Image: nginx/nginx-ingress:3.4.2
Image ID: docker.io/nginx/nginx-ingress@sha256:4b97f1d3466c804d51abbdeb84f2c7c3ea00d6a937a320d62a4cf9d6b447d6ad
Ports: 80/TCP, 443/TCP, 8081/TCP, 9113/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
Args:
-nginx-configmaps=$(POD_NAMESPACE)/nginx-config
State: Running
Started: Thu, 08 Feb 2024 17:17:51 +0100
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Thu, 08 Feb 2024 17:15:30 +0100
Finished: Thu, 08 Feb 2024 17:16:30 +0100
Ready: False
Restart Count: 5
Requests:
cpu: 100m
memory: 128Mi
Readiness: http-get http://:readiness-port/nginx-ready delay=0s timeout=1s period=1s #success=1 #failure=3
Environment:
POD_NAMESPACE: nginx-ingress (v1:metadata.namespace)
POD_NAME: nginx-ingress-b9fcfbb59-hwjc8 (v1:metadata.name)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vlfd8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-vlfd8:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m57s default-scheduler Successfully assigned nginx-ingress/nginx-ingress-b9fcfbb59-hwjc8 to node-2
Normal Pulling 8m57s kubelet Pulling image "nginx/nginx-ingress:3.4.2"
Normal Pulled 8m35s kubelet Successfully pulled image "nginx/nginx-ingress:3.4.2" in 21.588s (21.589s including waiting)
Normal Created 8m35s kubelet Created container nginx-ingress
Normal Started 8m35s kubelet Started container nginx-ingress
Warning Unhealthy 3m56s (x250 over 8m34s) kubelet Readiness probe failed: Get "http://10.244.4.116:8081/nginx-ready": dial tcp 10.244.4.116:8081: connect: connection refused
根据 nginx corp 的已知问题,我指示 helm 增加超时,但没有任何积极结果。
helm install nginx-ingress-controller nginx-stable/nginx-ingress --set rbac.create=true --set controller."nodeSelector\.kubernetes\.io/hostname"=node-2 --set nginxReloadTimeout=20000
您有什么建议吗?可能无需重置整个集群?
在不同的集群上它工作正常。
纯粹从部署角度来看——首先排除资源问题。对已停止的 nginx-ingress-gqkhz 或 nginx-ingress-dx6bg 副本进行描述并检查错误。还建议将其缩小到 1 或 2 个副本,并查看容器是否启动。就绪探针失败并不能说明什么。
此外,在显示为正在运行的容器上,读取日志(kubectl 日志 podname 容器名称)。这可能会给你一些信息。
虽然我在某些副本上看到 CrashLoopBackOff,但我必须排除任何网络问题,因为某些副本已拉取映像。