我正在尝试使用此 helm 存储库设置麋鹿集群:
elastic https://helm.elastic.co
。 Elastic-Search 已成功设置,helm install elasticsearch elastic/elasticsearch --namespace=logging
。在尝试使用 helm install filebeat elastic/filebeat --namespace=logging --values filebeat-values.yaml
设置 filebeat 时,其中 filebeat-values.yaml
是 :
daemonset:
extraEnvs:
- name: "ELASTICSEARCH_USERNAME"
valueFrom:
secretKeyRef:
name: elasticsearch-master-credentials
key: username
- name: "ELASTICSEARCH_PASSWORD"
valueFrom:
secretKeyRef:
name: elasticsearch-master-credentials
key: password
filebeatConfig:
filebeat.yml: |
logging.metrics.enabled: false
filebeat.inputs:
- type: container
paths:
- /var/log/containers/agri-check*.log
json:
keys_under_root: true
overwrite_keys: true
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
- type: container
paths:
- /var/log/containers/*.log
exclude_files: ['.*/agri-check.*$']
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
output.elasticsearch:
host: '${NODE_NAME}'
hosts: "https://elasticsearch-master:9200"
username: '${ELASTICSEARCH_USERNAME}'
password: '${ELASTICSEARCH_PASSWORD}'
protocol: https
ssl.verification_mode: none
其中一个 filebeat pod(k8s 节点池中的每个节点一个)正在崩溃,原因是:
filebeat-filebeat-x74hz 0/1 CrashLoopBackOff 6 (3m7s ago) 9m2s
,原因是
{"log.level":"info","@timestamp":"2023-07-13T08:55:56.776Z","log.origin":{"file.name":"instance/beat.go","file.line":392},"message":"filebeat stopped.","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-07-13T08:55:56.776Z","log.origin":{"file.name":"instance/beat.go","file.line":1057},"message":"Exiting: cannot obtain lockfile: connot start, data directory belongs to process with pid 8","service.name":"filebeat","ecs.version":"1.6.0"}
Exiting: cannot obtain lockfile: connot start, data directory belongs to process with pid 8
这是唯一崩溃的 Pod。其他节点上的其他 pod 运行良好,并且我在不同的集群(QA 集群)上有相同的设置。这是一个问题,因为只有当每个 K8s 节点启动一个 pod 时,设置才会继续。与elastic-search的连接运行正常。互联网上的一项研究告诉我删除锁定文件(
https://discuss.elastic.co/t/filebeat-crashloopbackoff/320830
)。但我不能,因为 Pod 已停止,并且我无法通过 Shell 连接到它。我还有其他调试可能性吗?
在 Filebeat 作为 DaemonSet 运行的 Kubernetes 环境中,确保每个 pod 的数据目录都有自己唯一的路径至关重要。
尝试:
路径.data:/usr/share/filebeat/data/$${HOSTNAME}