我正在尝试使用 Fluent-bit 将日志从 AWS EKS 发送到 AWS Cloudwatch。我的 fluid-bit 配置总体上可以正常工作,并且大多数日志都会发送到 CloudWatch,但较大的日志会出现问题。一开始它开始截断日志,最终丢弃日志记录。
[2023/12/05 14:10:54] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=77603133 watch_fd=5
[2023/12/05 14:30:14] [ warn] [output:cloudwatch_logs:cloudwatch_logs.0] [size=294805] Truncating event which is larger than max size allowed by CloudWatch
[2023/12/05 14:30:19] [ warn] [output:cloudwatch_logs:cloudwatch_logs.0] [size=364675] Truncating event which is larger than max size allowed by CloudWatch
[2023/12/05 14:30:24] [ warn] [output:cloudwatch_logs:cloudwatch_logs.0] [size=523322] Truncating event which is larger than max size allowed by CloudWatch
[2023/12/05 14:30:29] [ warn] [output:cloudwatch_logs:cloudwatch_logs.0] Discarding massive log record
[2023/12/05 14:30:29] [ warn] [output:cloudwatch_logs:cloudwatch_logs.0] [size=836367] Truncating event which is larger than max size allowed by CloudWatch
[2023/12/05 14:30:34] [ warn] [output:cloudwatch_logs:cloudwatch_logs.0] [size=554279] Truncating event which is larger than max size allowed by CloudWatch
[2023/12/05 14:30:39] [ warn] [output:cloudwatch_logs:cloudwatch_logs.0] [size=731990] Truncating event which is larger than max size allowed by CloudWatch
[2023/12/05 14:30:39] [ warn] [output:cloudwatch_logs:cloudwatch_logs.0] [size=793332] Truncating event which is larger than max size allowed by CloudWatch
这是我当前的设置:
Fluent-bit.conf
fluent-bit.conf: |
[SERVICE]
Flush 5
Grace 30
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server ${HTTP_SERVER}
HTTP_Listen 0.0.0.0
HTTP_Port ${HTTP_PORT}
storage.path /var/fluent-bit/state/flb-storage/
storage.sync normal
storage.checksum off
storage.max_chunks_up 128
storage.backlog.mem_limit 5M
scheduler.cap 30
@INCLUDE application-log.conf
@INCLUDE dataplane-log.conf
@INCLUDE host-log.conf
应用程序日志.conf
[INPUT]
Name tail
Tag application.*
Exclude_Path /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*
Path /var/log/containers/*.log
multiline.parser java, docker, cri
DB /var/fluent-bit/state/flb_container.db
Mem_Buf_Limit 50MB
Skip_Long_Lines Off
Refresh_Interval 5
Rotate_Wait 30
storage.type filesystem
Read_from_Head ${READ_FROM_HEAD}
[INPUT]
Name tail
Tag application.*
Path /var/log/containers/cloudwatch-agent*
multiline.parser docker, cri
DB /var/fluent-bit/state/flb_cwagent.db
Mem_Buf_Limit 5MB
Skip_Long_Lines Off
Refresh_Interval 30
Read_from_Head ${READ_FROM_HEAD}
[FILTER]
Name kubernetes
Match application.*
Kube_URL https://kubernetes.default.svc:443
Kube_Tag_Prefix application.var.log.containers.
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude Off
Labels On
Annotations Off
[OUTPUT]
Name cloudwatch_logs
Match application.*
region ${AWS_REGION}
log_group_name /aws/containerinsights/${CLUSTER_NAME}/application
log_stream_name $(kubernetes['container_name'])-$(kubernetes['pod_name'])
log_stream_template $kubernetes['container_name'].$kubernetes['pod_name']
auto_create_group On
Retry_Limit False
解析器.conf
parsers.conf: |
[MULTILINE_PARSER]
name multiline-regex
type regex
flush_timeout 5
# rules | state name | regex pattern | next state
# ------|---------------|--------------------------------------------
rule "start_state" "/(Dec \d+ \d+\:\d+\:\d+)(.*)/" "cont"
rule "cont" "/^\s+at.*/" "cont"
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%LZ
[PARSER]
Name syslog
Format regex
Regex ^(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
Time_Key time
Time_Format %b %d %H:%M:%S
[PARSER]
Name container_firstline
Format regex
Regex (?<log>(?<="log":")\S(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%LZ
[PARSER]
Name cwagent_firstline
Format regex
Regex (?<log>(?<="log":")\d{4}[\/-]\d{1,2}[\/-]\d{1,2}[ T]\d{2}:\d{2}:\d{2}(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%LZ
我尝试过的:
我尝试使用插件
cloudwatch
和cloudwatch_logs
发送日志。但不幸的是,我在两者上都给出了类似的错误。我还使用并尝试使用内置的 java 解析器和几个多行解析器,但没有运气。通常,pod 会运行一段时间,然后开始出现这些错误。所以我可能会假设某些东西可能与缓冲内存或其他东西有关?
欢迎任何想法或建议。
现在我已经解决了这个问题。
解析器运行良好。该错误是相关的,我们的一个 pod 每半小时就会输出大量日志。(14:30、15:30、16:30 等..)
虽然我发现了错误,但很难弄清楚哪条日志行到底被截断,因为有很多传入的日志。为了解决这个问题,我必须使用 lua 脚本。
这是我的解决方案,打印出被截断或丢弃的日志。
fluent-bit.conf: |
[SERVICE]
Daemon Off
Flush 2
Grace 5
Log_Level warn
Parsers_File parsers.conf
HTTP_Server ${HTTP_SERVER}
HTTP_Listen 0.0.0.0
HTTP_Port ${HTTP_PORT}
storage.path /var/fluent-bit/state/flb-storage/
storage.sync normal
storage.checksum off
storage.max_chunks_up 128
storage.backlog.mem_limit 5M
scheduler.cap 30
@INCLUDE application-log.conf
@INCLUDE dataplane-log.conf
@INCLUDE host-log.conf
get-size.lua: |
function cb_print(tag, timestamp, record)
if record["log"] ~= nil then
log_size = string.len(record["log"])
record["log_size"] = log_size
-- Print out all logs that are bigger than 255000 bytes
if log_size > 255000 then
print(record["log"])
end
end
return 1, timestamp, record
end
application-log.conf: |
[INPUT]
Name tail
Tag application.*
Exclude_Path /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*
Path /var/log/containers/*.log
multiline.parser java, docker, cri
DB /var/fluent-bit/state/flb_container.db
Mem_Buf_Limit 50MB
Skip_Long_Lines Off
Refresh_Interval 5
Rotate_Wait 30
storage.type filesystem
Read_from_Head ${READ_FROM_HEAD}
[INPUT]
Name tail
Tag application.*
Path /var/log/containers/cloudwatch-agent*
multiline.parser docker, cri
DB /var/fluent-bit/state/flb_cwagent.db
Mem_Buf_Limit 5MB
Skip_Long_Lines Off
Refresh_Interval 30
Read_from_Head ${READ_FROM_HEAD}
[FILTER]
Name lua
Match *
script get-size.lua
call cb_print
[FILTER]
Name kubernetes
Match application.*
Kube_URL https://kubernetes.default.svc:443
Kube_Tag_Prefix application.var.log.containers.
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude Off
Labels On
Annotations Off
[OUTPUT]
Name cloudwatch_logs
Match application.*
region ${AWS_REGION}
log_group_name /aws/containerinsights/${CLUSTER_NAME}/application
log_stream_name $(kubernetes['container_name'])-$(kubernetes['pod_name'])
log_stream_template $kubernetes['container_name'].$kubernetes['pod_name']
auto_create_group On
Retry_Limit False
希望这可以帮助某人节省大量时间。因为它对我来说:)