GCP GKE DNS 网络不稳定

Question

我们在 GCP 中部署了一个 GKE K8s 集群，并且 Pod 在解析 DNS 时遇到问题。看起来网络可能不稳定，但我不明白发生了什么。

如果我在 Pod 内运行 nslookup，它有时会起作用，有时会出现连接错误。

% kubectl exec -n <namespace> -ti <pod> -- nslookup google.com
;; connection timed out; no servers could be reached

command terminated with exit code 1

如果我查看 resolv.conf，我会发现名称服务器已设置为转至 k8s-dns 服务。

% kubectl exec -n <namespace> -ti <podf> -- cat /etc/resolv.conf
search <namespace>.svc.cluster.local svc.cluster.local cluster.local us-east4-a.c.<project id>.internal c.<project id>.internal google.internal
nameserver 10.128.0.10
options ndots:5

如果我尝试 ping DNS 服务器，它不起作用：

% kubectl exec -n <namespace> -ti <pod> -- ping 10.128.0.10
PING 10.128.0.10 (10.128.0.10): 56 data bytes
^C
--- 10.128.0.10 ping statistics ---
33 packets transmitted, 0 packets received, 100% packet loss
command terminated with exit code 1

如果我使用 netcat 来探索打开连接，它表明打开连接非常不稳定：

% kubectl exec -n <namespace> -ti <pod> -- nc -v -z -w 3  10.128.0.10 53
nc: 10.128.0.10 (10.128.0.10:53): Operation timed out
command terminated with exit code 1
% kubectl exec -n <namespace> -ti <pod> -- nc -v -z -w 3  10.128.0.10 53
10.128.0.10 (10.128.0.10:53) open
% kubectl exec -n <namespace> -ti <pod> -- nc -v -z -w 3  10.128.0.10 53
10.128.0.10 (10.128.0.10:53) open
% kubectl exec -n <namespace> -ti <pod> -- nc -v -z -w 3  10.128.0.10 53
10.128.0.10 (10.128.0.10:53) open
% kubectl exec -n <namespace> -ti <pod> -- nc -v -z -w 3  10.128.0.10 53
10.128.0.10 (10.128.0.10:53) open
% kubectl exec -n <namespace> -ti <pod> -- nc -v -z -w 3  10.128.0.10 53
nc: 10.128.0.10 (10.128.0.10:53): Operation timed out
command terminated with exit code 1

查看运行该IP的服务是否属于k8s-dns：

%% kubectl get services --all-namespaces -o wide | grep dns
kube-system    kube-dns                 ClusterIP      10.128.0.10     <none>         53/UDP,53/TCP      2y27d   k8s-app=kube-dns

什么会导致 GKE 上出现这种不稳定的网络？我该如何解决？

Answer 1

所以，我找到了解决方案，尽管我不确定我是否理解原因。

我在集群上打开了 NodeLocalDNS。

gcloud container clusters update <cluster> --update-addons=NodeLocalDNS=ENABLED

然后我检查并发现 node-local-dns 没有运行。

% kubectl get pods -n kube-system -o wide | grep node-local-dns

因此我触发了集群的节点升级，使其重建所有节点。

这似乎已经解决了它，但我不确定为什么。但至少它又可以工作了。

GCP GKE DNS 网络不稳定

问题描述投票：0回答：1

1个回答

最新问题

GCP GKE DNS 网络不稳定

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1