失败的K8s rabbitmq-peer-discovery-k8s群集

Question

我正在尝试使用Rabbitmq-peer-discovery-k8s插件在Kubernetes上建立一个RabbitMQ集群，我总是只在pod上运行并准备就绪，但下一个总是失败。

我尝试对我的配置进行多项更改，这就是至少有一个pod运行的原因

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: rabbitmq 
  namespace: namespace-dev
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: endpoint-reader
  namespace: namespace-dev
rules:
- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: endpoint-reader
  namespace: namespace-dev
subjects:
- kind: ServiceAccount
  name: rabbitmq
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: endpoint-reader
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: "rabbitmq-data"
  labels:
    name: "rabbitmq-data"
    release: "rabbitmq-data"
    namespace: "namespace-dev"
spec:
  capacity:
    storage: 5Gi
  accessModes:
  - "ReadWriteMany"
  nfs:
    path: "/path/to/nfs"
    server: "xx.xx.xx.xx"
  persistentVolumeReclaimPolicy: Retain

---  
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: "rabbitmq-data-claim"
  namespace: "namespace-dev"
spec:
  accessModes:
    - ReadWriteMany
  resources:  
    requests:
      storage: 5Gi
  selector:
    matchLabels:
      release: rabbitmq-data
---
# headless service Used to access pods using hostname
kind: Service
apiVersion: v1
metadata:
  name: rabbitmq-headless
  namespace: namespace-dev
spec:
  clusterIP: None
  # publishNotReadyAddresses, when set to true, indicates that DNS implementations must publish the notReadyAddresses of subsets for the Endpoints associated with the Service.     The default value is false. The primary use case for setting this field is to use a StatefulSet's Headless Service to propagate SRV records for its Pods without respect to     their readiness for purpose of peer discovery. This field will replace the service.alpha.kubernetes.io/tolerate-unready-endpoints when that annotation is deprecated and all clients have been converted to use this field.
  # Since access to the Pod using DNS requires Pod and Headless service to be started before launch, publishNotReadyAddresses is set to true to prevent readinessProbe from finding DNS when the service is not started.
  publishNotReadyAddresses: true 
  ports: 
   - name: amqp
     port: 5672
   - name: http
     port: 15672
  selector:
    app: rabbitmq
---
# Used to expose the dashboard to the external network
kind: Service
apiVersion: v1
metadata:
  namespace: namespace-dev
  name: rabbitmq-service
spec:
  type: NodePort
  ports:
   - name: http
     protocol: TCP
     port: 15672
     targetPort: 15672
     nodePort: 31672
   - name: amqp
     protocol: TCP
     port: 5672
     targetPort: 5672
     nodePort: 30672
  selector:
    app: rabbitmq
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: rabbitmq-config
  namespace: namespace-dev
data:
  enabled_plugins: |
      [rabbitmq_management,rabbitmq_peer_discovery_k8s].
  rabbitmq.conf: |
      cluster_formation.peer_discovery_backend  = rabbit_peer_discovery_k8s
      cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
      cluster_formation.k8s.address_type = hostname
      cluster_formation.node_cleanup.interval = 10
      cluster_formation.node_cleanup.only_log_warning = true
      cluster_partition_handling = autoheal
      queue_master_locator=min-masters
      loopback_users.guest = false

      cluster_formation.randomized_startup_delay_range.min = 0
      cluster_formation.randomized_startup_delay_range.max = 2
      cluster_formation.k8s.service_name = rabbitmq-headless
      cluster_formation.k8s.hostname_suffix = .rabbitmq-headless.namespace-dev.svc.cluster.local
      vm_memory_high_watermark.absolute = 1.6GB
      disk_free_limit.absolute = 2GB

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: rabbitmq
  namespace: rabbitmq
spec:
  serviceName: rabbitmq-headless   # Must be the same as the name of the headless service, used for hostname propagation access pod
  selector:
    matchLabels:
      app: rabbitmq # In apps/v1, it needs to be the same as .spec.template.metadata.label for hostname propagation access pods, but not in apps/v1beta
  replicas: 3
  template:
    metadata:
      labels:
        app: rabbitmq  # In apps/v1, the same as .spec.selector.matchLabels
      # setting podAntiAffinity
      annotations:
        scheduler.alpha.kubernetes.io/affinity: >
            {
              "podAntiAffinity": {
                "requiredDuringSchedulingIgnoredDuringExecution": [{
                  "labelSelector": {
                    "matchExpressions": [{
                      "key": "app",
                      "operator": "In",
                      "values": ["rabbitmq"]
                    }]
                  },
                  "topologyKey": "kubernetes.io/hostname"
                }]
              }
            }
    spec:
      serviceAccountName: rabbitmq
      terminationGracePeriodSeconds: 10
      containers:        
      - name: rabbitmq
        image: rabbitmq:3.7.10
        resources:
          limits:
            cpu: "0.5"
            memory: 2Gi
          requests:
            cpu: "0.3"
            memory: 2Gi
        volumeMounts:
          - name: config-volume
            mountPath: /etc/rabbitmq
          - name: rabbitmq-data
            mountPath: /var/lib/rabbitmq/mnesia
        ports:
          - name: http
            protocol: TCP
            containerPort: 15672
          - name: amqp
            protocol: TCP
            containerPort: 5672
        livenessProbe:
          exec:
            command: ["rabbitmqctl", "status"]
          initialDelaySeconds: 60
          periodSeconds: 60
          timeoutSeconds: 5
        readinessProbe:
          exec:
            command: ["rabbitmqctl", "status"]
          initialDelaySeconds: 20
          periodSeconds: 60
          timeoutSeconds: 5
        imagePullPolicy: IfNotPresent
        env:
          - name: HOSTNAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: RABBITMQ_USE_LONGNAME
            value: "true"
          - name: RABBITMQ_NODENAME
            value: "rabbit@$(HOSTNAME).rabbitmq-headless.namespace-dev.svc.cluster.local"
          # If service_name is set in ConfigMap, there is no need to set it again here.
          # - name: K8S_SERVICE_NAME
          #   value: "rabbitmq-headless"
          - name: RABBITMQ_ERLANG_COOKIE
            value: "mycookie" 
      volumes:
        - name: config-volume
          configMap:
            name: rabbitmq-config
            items:
            - key: rabbitmq.conf
              path: rabbitmq.conf
            - key: enabled_plugins
              path: enabled_plugins
        - name: rabbitmq-data
          persistentVolumeClaim:
            claimName: rabbitmq-data-claim

我只运行一个pod并准备好而不是3个副本

[admin@devsvr3 yaml]$ kubectl get pods
NAME                          READY   STATUS    RESTARTS   AGE
rabbitmq-0                    1/1     Running   0          2m2s
rabbitmq-1                    0/1     Running   1          43s

检查失败的吊舱我得到了这个。

[admin@devsvr3 yaml]$ kubectl logs rabbitmq-1

  ##  ##
  ##  ##      RabbitMQ 3.7.10. Copyright (C) 2007-2018 Pivotal Software, Inc.
  ##########  Licensed under the MPL.  See http://www.rabbitmq.com/
  ######  ##
  ##########  Logs: <stdout>

              Starting broker...
2019-02-06 21:09:03.303 [info] <0.211.0> 
 Starting RabbitMQ 3.7.10 on Erlang 21.2.3
 Copyright (C) 2007-2018 Pivotal Software, Inc.
 Licensed under the MPL.  See http://www.rabbitmq.com/
2019-02-06 21:09:03.315 [info] <0.211.0> 
 node           : rabbit@rabbitmq-1.rabbitmq-headless.namespace-dev.svc.cluster.local
 home dir       : /var/lib/rabbitmq
 config file(s) : /etc/rabbitmq/rabbitmq.conf
 cookie hash    : XhdCf8zpVJeJ0EHyaxszPg==
 log(s)         : <stdout>
 database dir   : /var/lib/rabbitmq/mnesia/rabbit@rabbitmq-1.rabbitmq-headless.namespace-dev.svc.cluster.local
2019-02-06 21:09:10.617 [error] <0.219.0> Unable to parse vm_memory_high_watermark value "1.6GB"
2019-02-06 21:09:10.617 [info] <0.219.0> Memory high watermark set to 103098 MiB (108106919116 bytes) of 257746 MiB (270267297792 bytes) total
2019-02-06 21:09:10.690 [info] <0.221.0> Enabling free disk space monitoring
2019-02-06 21:09:10.690 [info] <0.221.0> Disk free limit set to 2000MB
2019-02-06 21:09:10.698 [info] <0.224.0> Limiting to approx 1048476 file handles (943626 sockets)
2019-02-06 21:09:10.698 [info] <0.225.0> FHC read buffering:  OFF
2019-02-06 21:09:10.699 [info] <0.225.0> FHC write buffering: ON
2019-02-06 21:09:10.702 [info] <0.211.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@rabbitmq-1.rabbitmq-headless.namespace-dev.svc.cluster.local is empty. Assuming we need to join an existing cluster or initialise from scratch...
2019-02-06 21:09:10.702 [info] <0.211.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2019-02-06 21:09:10.702 [info] <0.211.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2019-02-06 21:09:10.702 [info] <0.211.0> Peer discovery backend does not support locking, falling back to randomized delay
2019-02-06 21:09:10.702 [info] <0.211.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.
2019-02-06 21:09:10.710 [info] <0.211.0> Failed to get nodes from k8s - {failed_connect,[{to_address,{"kubernetes.default.svc.cluster.local",443}},
                 {inet,[inet],nxdomain}]}
2019-02-06 21:09:10.711 [error] <0.210.0> CRASH REPORT Process <0.210.0> with 0 neighbours exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n                 {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 164 in application_master:init/4 line 138
2019-02-06 21:09:10.711 [info] <0.43.0> Application rabbit exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n                 {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 164
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,\"{failed_connect,[{to_address,{\\"kubernetes.default.svc.cluster.local\\",443}},\n                 {inet,[inet],nxdomain}]}\"}},[{rabbit_mnesia,init_from_config,0,[{file,\"src/rabbit_mnesia.erl\"},{line,164}]},{rabbit_mnesia,init_with_lock,3,[{file,\"src/rabbit_mnesia.erl\"},{line,144}]},{rabbit_mnesia,init,0,[{file,\"src/rabbit_mnesia.erl\"},{line,111}]},{rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,run_step,2,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit_boot_steps,run_boot_steps,1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit,start,2,[{file,\"src/rabbit.erl\"},{line,815}]}]}}}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,"{failed_connect,[{to_address,{\"kubernetes.defau

Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
[admin@devsvr3 yaml]$

我在这做错了什么？

Answer 1

最后我通过在我的pod的/etc/resolv.conf中添加它来修复它：

[my-rabbit-svc].[my-rabbitmq-namespace].svc.[cluster-name]

要在我的pod中添加这个，我在StatefulSet中使用了这个设置：

dnsConfig:
    searches:
      - [my-rabbit-svc].[my-rabbitmq-namespace].svc.[cluster-name]

完整文档here

Answer 2

尝试设置：

cluster_formation.k8s.host = [your kubernetes endpoint ip addres]
cluster_formation.k8s.port = [your kubernetes endpoint port]

因为你的pod似乎无法解决这个名字：

kubernetes.default.svc.cluster.local

Answer 3

尝试使用这个稳定的头盔图https://github.com/helm/charts/tree/master/stable/rabbitmq

Answer 4

这里可能的解决方案之一，而不是将dns配置附加到pod - 使用k8s代理边车。所以，而不是解决

kubernetes.default.svc.cluster.local

你可以设置像边车一样的容器

- name: "k8s-api-sidecar"
  image: "tommyvn/kubectl-proxy:latest"

在statefullset / deployment中更改configmap以使用它

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
cluster_formation.k8s.host = localhost
cluster_formation.k8s.port = 8001
cluster_formation.k8s.scheme = http

如果你看看这个存储库https://github.com/tommyvn/kubectl-proxy，你会发现它只是一个调用kubectl代理。

失败的K8s rabbitmq-peer-discovery-k8s群集

问题描述投票：2回答：4

4个回答

最新问题

失败的K8s rabbitmq-peer-discovery-k8s群集

问题描述 投票：2回答：4

4个回答

最新问题

问题描述投票：2回答：4