Kubernetes群集中的DNS解决问题

问题描述 投票:0回答:1

我们有一个Kubernetes集群,它由四个工作线程和一个主节点组成。在worker1worker2上,我们无法解析DNS名称,但在其他两个节点中,一切正常!我按照官方文档here的说明进行操作,并且我意识到coredns容器未收到来自worker1和2的查询。我再说一遍worker3worker4的一切都很好,我对worker1worker2有问题。例如,当我在worker1中运行busybox容器并执行nslookup kubernetes.default时,它不会返回任何内容,但是当它在worker3中运行时,DNS解析就可以了。

集群信息:

$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:43:08Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

$ kubectl get pod -n kube-system
NAME                                    READY   STATUS    RESTARTS   AGE
coredns-576cbf47c7-6dtrc                1/1     Running   5          82d
coredns-576cbf47c7-jvx5l                1/1     Running   6          82d
etcd-master                             1/1     Running   35         298d
kube-apiserver-master                   1/1     Running   14         135m
kube-controller-manager-master          1/1     Running   42         298d
kube-proxy-22f49                        1/1     Running   9          91d
kube-proxy-2s9sx                        1/1     Running   34         298d
kube-proxy-jh2m7                        1/1     Running   5          81d
kube-proxy-rc5r8                        1/1     Running   5          63d
kube-proxy-vg8jd                        1/1     Running   6          104d
kube-scheduler-master                   1/1     Running   39         298d
kubernetes-dashboard-65c76f6c97-7cwwp   1/1     Running   45         293d
tiller-deploy-779784fbd6-dzq7k          1/1     Running   5          87d
weave-net-556ml                         2/2     Running   12         66d
weave-net-h9km9                         2/2     Running   15         81d
weave-net-s88z4                         2/2     Running   0          145m
weave-net-smrgc                         2/2     Running   14         63d
weave-net-xf6ng                         2/2     Running   15         82d

$ kubectl logs coredns-576cbf47c7-6dtrc -n kube-system | tail -20
10.44.0.28:32837 - [14/Dec/2019:12:22:51 +0000] 2957 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.000661167s
10.44.0.28:51373 - [14/Dec/2019:12:25:09 +0000] 46278 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.000440918s
10.44.0.28:51373 - [14/Dec/2019:12:25:09 +0000] 47697 "A IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 110 0.00059741s
10.44.0.28:44969 - [14/Dec/2019:12:27:27 +0000] 33222 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.00044739s
10.44.0.28:44969 - [14/Dec/2019:12:27:27 +0000] 52126 "A IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 110 0.000310494s
10.44.0.28:39392 - [14/Dec/2019:12:29:11 +0000] 41041 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.000481309s
10.44.0.28:40999 - [14/Dec/2019:12:29:11 +0000] 695 "AAAA IN spark-master.svc.cluster.local. udp 48 false 512" NXDOMAIN qr,aa,rd,ra 141 0.000247078s
10.44.0.28:54835 - [14/Dec/2019:12:29:12 +0000] 59604 "AAAA IN spark-master. udp 30 false 512" NXDOMAIN qr,rd,ra 106 0.020408006s
10.44.0.28:38604 - [14/Dec/2019:12:29:15 +0000] 53244 "A IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 110 0.000209231s
10.44.0.28:38604 - [14/Dec/2019:12:29:15 +0000] 23079 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,rd,ra 149 0.000191722s
10.44.0.28:57478 - [14/Dec/2019:12:32:15 +0000] 15451 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.000383919s
10.44.0.28:57478 - [14/Dec/2019:12:32:15 +0000] 45086 "A IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 110 0.001197812s
10.40.0.34:54678 - [14/Dec/2019:12:52:31 +0000] 6509 "A IN kubernetes.default.svc.monitoring.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd,ra 162 0.000522769s
10.40.0.34:60234 - [14/Dec/2019:12:52:31 +0000] 15538 "AAAA IN kubernetes.default.svc.monitoring.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd,ra 162 0.000851171s
10.40.0.34:43989 - [14/Dec/2019:12:52:31 +0000] 2712 "AAAA IN kubernetes.default.svc.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd,ra 151 0.000306038s
10.40.0.34:59265 - [14/Dec/2019:12:52:31 +0000] 23765 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000274748s
10.40.0.34:45622 - [14/Dec/2019:13:26:31 +0000] 38766 "AAAA IN kubernetes.default.svc.monitoring.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd,ra 162 0.000436681s
10.40.0.34:42759 - [14/Dec/2019:13:26:31 +0000] 56753 "A IN kubernetes.default.svc.monitoring.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd,ra 162 0.000706638s
10.40.0.34:39563 - [14/Dec/2019:13:26:31 +0000] 37876 "AAAA IN kubernetes.default.svc.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd,ra 151 0.000445999s
10.40.0.34:57224 - [14/Dec/2019:13:26:31 +0000] 33157 "A IN kubernetes.default.svc.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd,ra 151 0.000536896s

$ kubectl get svc -n kube-system
NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
kube-dns               ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP   298d
kubernetes-dashboard   ClusterIP   10.96.204.236   <none>        443/TCP         298d
tiller-deploy          ClusterIP   10.110.41.66    <none>        44134/TCP       123d

$ kubectl get ep kube-dns --namespace=kube-system
NAME       ENDPOINTS                                               AGE
kube-dns   10.32.0.98:53,10.44.0.21:53,10.32.0.98:53 + 1 more...   298d

当busybox在worker1中时:

$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1

但是当busybox在worker3中时:

$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10
Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local

所有节点均为:Ubuntu 16.04

所有pod的/etc/resolve.conf的内容都相同。

唯一可以找到的区别是在[[kube-proxy日志中:

工作节点kube-proxy日志:

$ kubectl logs kube-proxy-vg8jd -n kube-system W1214 06:12:19.201889 1 server_others.go:295] Flag proxy-mode="" unknown, assuming iptables proxy I1214 06:12:19.321747 1 server_others.go:148] Using iptables Proxier. W1214 06:12:19.332725 1 proxier.go:317] clusterCIDR not specified, unable to distinguish between internal and external traffic I1214 06:12:19.332949 1 server_others.go:178] Tearing down inactive rules. I1214 06:12:20.557875 1 server.go:447] Version: v1.12.1 I1214 06:12:20.601081 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072 I1214 06:12:20.601393 1 conntrack.go:52] Setting nf_conntrack_max to 131072 I1214 06:12:20.601958 1 conntrack.go:83] Setting conntrack hashsize to 32768 I1214 06:12:20.602234 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400 I1214 06:12:20.602300 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600 I1214 06:12:20.602544 1 config.go:202] Starting service config controller I1214 06:12:20.602561 1 controller_utils.go:1027] Waiting for caches to sync for service config controller I1214 06:12:20.602585 1 config.go:102] Starting endpoints config controller I1214 06:12:20.602619 1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller I1214 06:12:20.702774 1 controller_utils.go:1034] Caches are synced for service config controller I1214 06:12:20.702827 1 controller_utils.go:1034] Caches are synced for endpoints config controller

无法正常工作的节点kube-proxy日志:

$ kubectl logs kube-proxy-fgzpf -n kube-system W1215 12:47:12.660749 1 server_others.go:295] Flag proxy-mode="" unknown, assuming iptables proxy I1215 12:47:12.679348 1 server_others.go:148] Using iptables Proxier. W1215 12:47:12.679538 1 proxier.go:317] clusterCIDR not specified, unable to distinguish between internal and external traffic I1215 12:47:12.679665 1 server_others.go:178] Tearing down inactive rules. E1215 12:47:12.760702 1 proxier.go:529] Error removing iptables rules in ipvs proxier: error deleting chain "KUBE-MARK-MASQ": exit status 1: iptables: Too many links. I1215 12:47:12.799926 1 server.go:447] Version: v1.12.1 I1215 12:47:12.832047 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072 I1215 12:47:12.833067 1 conntrack.go:52] Setting nf_conntrack_max to 131072 I1215 12:47:12.833266 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400 I1215 12:47:12.833498 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600 I1215 12:47:12.833934 1 config.go:202] Starting service config controller I1215 12:47:12.834061 1 controller_utils.go:1027] Waiting for caches to sync for service config controller I1215 12:47:12.834253 1 config.go:102] Starting endpoints config controller I1215 12:47:12.834338 1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller I1215 12:47:12.934408 1 controller_utils.go:1034] Caches are synced for service config controller I1215 12:47:12.934564 1 controller_utils.go:1034] Caches are synced for endpoints config controller

第5行没有出现在第一个行中。我不知道这是否与问题有关。

欢迎提出任何建议。

kubernetes dns kubeadm coredns
1个回答
0
投票
svc.svc中的双kubernetes.default.svc.svc.cluster.local看起来很烂。检查coredns-576cbf47c7-6dtrc窗格中的内容是否相同。

关闭coredns-576cbf47c7-6dtrc窗格以确保剩下的单个DNS实例将回答来自所有工作节点的DNS查询。

根据docs,类似这样的问题“ ...表示coredns / kube-dns附加组件或相关服务存在问题”。重新启动coredns可能会解决此问题。

我将添加到要检查的事物列表中,以检查和比较节点上的/etc/resolv.conf

© www.soinside.com 2019 - 2024. All rights reserved.