helm filebeat pod 在 AKS (Azure k8s) 上因 CrashLoopBackOff 崩溃,原因是:退出:无法获取锁定文件

问题描述 投票:0回答:1

我正在尝试使用此 helm 存储库设置麋鹿集群:

elastic https://helm.elastic.co
。 Elastic-Search 已成功设置,
helm install elasticsearch elastic/elasticsearch --namespace=logging
。在尝试使用
helm install filebeat elastic/filebeat --namespace=logging --values filebeat-values.yaml
设置 filebeat 时,其中
filebeat-values.yaml
是 :

daemonset:
  extraEnvs:
    - name: "ELASTICSEARCH_USERNAME"
      valueFrom:
        secretKeyRef:
          name: elasticsearch-master-credentials
          key: username
    - name: "ELASTICSEARCH_PASSWORD"
      valueFrom:
        secretKeyRef:
          name: elasticsearch-master-credentials
          key: password

filebeatConfig:
  filebeat.yml: |
    logging.metrics.enabled: false
    filebeat.inputs:
      - type: container
        paths:
          - /var/log/containers/agri-check*.log
        json:
          keys_under_root: true
          overwrite_keys: true
        processors:
          - add_kubernetes_metadata:
              host: ${NODE_NAME}
              matchers:
                - logs_path:
                    logs_path: "/var/log/containers/"
      - type: container
        paths:
          - /var/log/containers/*.log
        exclude_files: ['.*/agri-check.*$']
        processors:
          - add_kubernetes_metadata:
              host: ${NODE_NAME}
              matchers:
                - logs_path:
                    logs_path: "/var/log/containers/"

    output.elasticsearch:
      host: '${NODE_NAME}'
      hosts: "https://elasticsearch-master:9200"
      username: '${ELASTICSEARCH_USERNAME}'
      password: '${ELASTICSEARCH_PASSWORD}'
      protocol: https
      ssl.verification_mode: none

其中一个 filebeat pod(k8s 节点池中的每个节点一个)正在崩溃,原因是:

filebeat-filebeat-x74hz     0/1     CrashLoopBackOff   6 (3m7s ago)    9m2s
,原因是

{"log.level":"info","@timestamp":"2023-07-13T08:55:56.776Z","log.origin":{"file.name":"instance/beat.go","file.line":392},"message":"filebeat stopped.","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-07-13T08:55:56.776Z","log.origin":{"file.name":"instance/beat.go","file.line":1057},"message":"Exiting: cannot obtain lockfile: connot start, data directory belongs to process with pid 8","service.name":"filebeat","ecs.version":"1.6.0"}
Exiting: cannot obtain lockfile: connot start, data directory belongs to process with pid 8

这是唯一崩溃的 Pod。其他节点上的其他 pod 运行良好,并且我在不同的集群(QA 集群)上有相同的设置。这是一个问题,因为只有当每个 K8s 节点启动一个 pod 时,设置才会继续。与elastic-search的连接运行正常。互联网上的一项研究告诉我删除锁定文件(

https://discuss.elastic.co/t/filebeat-crashloopbackoff/320830
)。但我不能,因为 Pod 已停止,并且我无法通过 Shell 连接到它。我还有其他调试可能性吗?

elasticsearch kubernetes-helm azure-aks filebeat
1个回答
0
投票

在 Filebeat 作为 DaemonSet 运行的 Kubernetes 环境中,确保每个 pod 的数据目录都有自己唯一的路径至关重要。

尝试:

路径.data:/usr/share/filebeat/data/$${HOSTNAME}

© www.soinside.com 2019 - 2024. All rights reserved.