我目前正在完成学士学位的学习实习,我的项目正在做 CI/CD 管道,所以我对 DevOps 有点陌生,所以我希望有人能给我一些宝贵的时间来帮助我。 使用 NAT 网络在 vmware 上设置 3 个 ubuntu 服务器实例后 这是我使用 kubeadm 设置具有 1 个主节点和 2 个工作节点的 kubernetes 集群所执行的命令列表
apt install docker.io -y
sudo swapoff -a
nano /etc/fstab
#ive commented the swap line here
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl gpg
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.29/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.29/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
sudo systemctl enable --now kubelet
#then on my master node ive executed the kubeadm init and did the export config thing and joined the worker nodes using the cmd the master gave me
sudo kubeadm init
前 10 分钟一切正常,我加入了安装了 calico 等的节点,然后每当我尝试执行“kubectl getnodes”时,我都会收到错误
The connection to the server 192.168.149.141:6443 was refused - did you specify the right host or port?
我尝试重新启动 docker 和 kubelet 服务,它又解决了 10 分钟的问题,然后再次崩溃 另外,我注意到一些 Pod 不断重新启动,您可以看到 CrashLoopBackOff,这不是性能问题,因为当我安装指标时,所有节点上的 Cpu 和 ram 使用率低于 40%
这是 kubelet 的日志: https://imgur.com/a/sWH3oUb
我的 kube-apiserver.yaml
GNU nano 6.2 kube-apiserver.yaml apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 192.168.149.141:6443
creationTimestamp: null
labels:
component: kube-apiserver
tier: control-plane
name: kube-apiserver
namespace: kube-system
spec:
containers:
- command:
- kube-apiserver
- --advertise-address=192.168.149.141
- --allow-privileged=true
- --authorization-mode=Node,RBAC
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --enable-admission-plugins=NodeRestriction
- --enable-bootstrap-token-auth=true
- --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
- --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
- --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
- --etcd-servers=https://127.0.0.1:2379
- --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
- --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
- --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
- --requestheader-allowed-names=front-proxy-client
- --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
- --requestheader-extra-headers-prefix=X-Remote-Extra-
- --requestheader-group-headers=X-Remote-Group
- --requestheader-username-headers=X-Remote-User
- --secure-port=6443
- --service-account-issuer=https://kubernetes.default.svc.cluster.local
- --service-account-key-file=/etc/kubernetes/pki/sa.pub
- --service-account-signing-key-file=/etc/kubernetes/pki/sa.key
- --service-cluster-ip-range=10.96.0.0/12
- --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
- --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
image: registry.k8s.io/kube-apiserver:v1.29.3
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 192.168.149.141
httpGet:
host: 192.168.149.141
path: /livez
port: 6443
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: kube-apiserver
readinessProbe:
failureThreshold: 3
httpGet:
host: 192.168.149.141
path: /readyz
port: 6443
scheme: HTTPS
periodSeconds: 1
timeoutSeconds: 15
resources:
requests:
cpu: 250m
startupProbe:
failureThreshold: 24
httpGet:
host: 192.168.149.141
path: /livez
port: 6443
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /etc/ssl/certs
name: ca-certs
readOnly: true
- mountPath: /etc/ca-certificates
name: etc-ca-certificates
readOnly: true
- mountPath: /etc/pki
name: etc-pki
readOnly: true
- mountPath: /etc/kubernetes/pki
name: k8s-certs
readOnly: true
- mountPath: /usr/local/share/ca-certificates
name: usr-local-share-ca-certificates
readOnly: true
- mountPath: /usr/share/ca-certificates
readOnly: true
- mountPath: /usr/share/ca-certificates
name: usr-share-ca-certificates
readOnly: true
hostNetwork: true
priority: 2000001000
priorityClassName: system-node-critical
securityContext:
seccompProfile:
type: RuntimeDefault
volumes:
- hostPath:
path: /etc/ssl/certs
type: DirectoryOrCreate
name: ca-certs
- hostPath:
path: /etc/ca-certificates
type: DirectoryOrCreate
name: etc-ca-certificates
- hostPath:
path: /etc/pki
type: DirectoryOrCreate
name: etc-pki
- hostPath:
path: /etc/kubernetes/pki
type: DirectoryOrCreate
name: k8s-certs
- hostPath:
path: /usr/local/share/ca-certificates
type: DirectoryOrCreate
name: usr-local-share-ca-certificates
- hostPath:
path: /usr/share/ca-certificates
type: DirectoryOrCreate
name: usr-share-ca-certificates
status: {}
我准备提供更多信息
您可以尝试以下步骤来缩小导致问题的组件范围:
Node
(192.168.149.141)之一,尝试从 localhost:6443 连接到 API 服务器
这应该可以帮助您验证 API 服务器是否无法正常工作或其网络相关。
nc -v -l 50000
(192.168.149.141)上打开一个Node
进程,并通过Node
连接到nc -v 192.168.149.141 50000
,并尝试发送消息以查看远程Node
是否收到它们。
如果是网络相关问题,这将帮助您确定是 NAT(VMWare) 相关问题还是路由器相关问题。
如果你不耐烦,可以尝试使用
Rancher
(或RKE)来部署集群,而不是kubeadmin
,对初学者更友好。