helm filebeat pod 在 AKS (Azure k8s) 上因 CrashLoopBackOff 崩溃，原因是：退出：无法获取锁定文件

Question

我正在尝试使用此 helm 存储库设置麋鹿集群：

elastic https://helm.elastic.co

。 Elastic-Search 已成功设置，

helm install elasticsearch elastic/elasticsearch --namespace=logging

。在尝试使用

helm install filebeat elastic/filebeat --namespace=logging --values filebeat-values.yaml

设置 filebeat 时，其中

filebeat-values.yaml

是 :

daemonset:
  extraEnvs:
    - name: "ELASTICSEARCH_USERNAME"
      valueFrom:
        secretKeyRef:
          name: elasticsearch-master-credentials
          key: username
    - name: "ELASTICSEARCH_PASSWORD"
      valueFrom:
        secretKeyRef:
          name: elasticsearch-master-credentials
          key: password

filebeatConfig:
  filebeat.yml: |
    logging.metrics.enabled: false
    filebeat.inputs:
      - type: container
        paths:
          - /var/log/containers/agri-check*.log
        json:
          keys_under_root: true
          overwrite_keys: true
        processors:
          - add_kubernetes_metadata:
              host: ${NODE_NAME}
              matchers:
                - logs_path:
                    logs_path: "/var/log/containers/"
      - type: container
        paths:
          - /var/log/containers/*.log
        exclude_files: ['.*/agri-check.*$']
        processors:
          - add_kubernetes_metadata:
              host: ${NODE_NAME}
              matchers:
                - logs_path:
                    logs_path: "/var/log/containers/"

    output.elasticsearch:
      host: '${NODE_NAME}'
      hosts: "https://elasticsearch-master:9200"
      username: '${ELASTICSEARCH_USERNAME}'
      password: '${ELASTICSEARCH_PASSWORD}'
      protocol: https
      ssl.verification_mode: none

其中一个 filebeat pod（k8s 节点池中的每个节点一个）正在崩溃，原因是：

filebeat-filebeat-x74hz     0/1     CrashLoopBackOff   6 (3m7s ago)    9m2s

，原因是

{"log.level":"info","@timestamp":"2023-07-13T08:55:56.776Z","log.origin":{"file.name":"instance/beat.go","file.line":392},"message":"filebeat stopped.","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-07-13T08:55:56.776Z","log.origin":{"file.name":"instance/beat.go","file.line":1057},"message":"Exiting: cannot obtain lockfile: connot start, data directory belongs to process with pid 8","service.name":"filebeat","ecs.version":"1.6.0"}
Exiting: cannot obtain lockfile: connot start, data directory belongs to process with pid 8

这是唯一崩溃的 Pod。其他节点上的其他 pod 运行良好，并且我在不同的集群（QA 集群）上有相同的设置。这是一个问题，因为只有当每个 K8s 节点启动一个 pod 时，设置才会继续。与elastic-search的连接运行正常。互联网上的一项研究告诉我删除锁定文件（

https://discuss.elastic.co/t/filebeat-crashloopbackoff/320830

）。但我不能，因为 Pod 已停止，并且我无法通过 Shell 连接到它。我还有其他调试可能性吗？

Answer 1

在 Filebeat 作为 DaemonSet 运行的 Kubernetes 环境中，确保每个 pod 的数据目录都有自己唯一的路径至关重要。

尝试：

路径.data：/usr/share/filebeat/data/$${HOSTNAME}

helm filebeat pod 在 AKS (Azure k8s) 上因 CrashLoopBackOff 崩溃，原因是：退出：无法获取锁定文件

问题描述投票：0回答：1

1个回答

最新问题

helm filebeat pod 在 AKS (Azure k8s) 上因 CrashLoopBackOff 崩溃，原因是：退出：无法获取锁定文件

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1