我正在尝试设置 Fluent Bit 以从 Kubernetes/containerd 获取日志并将其发送到 Splunk。我们的应用程序创建的日志均以固定开始标记开始并以固定结束标记结束([MY_LOG_START] 和 [MY_LOG_END]);这在我们所有的众多服务中都是一致的,实际上无法改变。这是我们的日志的示例:
[MY_LOG_START] 2023-03-17T12:34:52 INFO This is a single line log.[MY_LOG_END]
[MY_LOG_START] 2023-03-17T12:34:56 AUDIT This is a multi line log
From: some_app
To: some_other_app
Response: 200
Body: {
"hello": "world",
"goodbye": "moon"
}[MY_LOG_END]
[MY_LOG_START] 2023-03-17T12:34:52 WARN This is another single line log.[MY_LOG_END]
虽然使用正则表达式规则进行多行解析,但我不知道如何做到这一点。这是我的解析器配置,其中包含我采取的几种不同方法以及对每种方法的评论。
[MULTILINE_PARSER]
name multiline-my-log
type regex
flush_timeout 2000
# rules | state name | regex pattern | next state
# ------|---------------|--------------------------------------------
rule "start_state" "/^\[MY_LOG_START\].*$/" "cont"
# This is the closest I got and works fine as long as the log is exactly
# three lines long. The problem is, there's no way I can see to repeat
# the first rule until the second one is matched.
rule "cont" "/^.*$/" "end_state"
rule "end_state" "/^.*\[MY_LOG_END\]$/" "end_state"
# I tried using negative lookahead, but this has the problem that a)
# the wildcard before it seems to just swallow up the whole line and
# make the match not work and b), the final line with the end tag
# would not be picked up as it would not match.
#rule "cont" "/^.*(?!\[MY_LOG_END\].*)$/" "cont"
# I also tried this approach, which did not work, it does not combine
# any of the multiline logs.
#rule "cont" "/^.*[^\[][^M][^Y][^_][^L][^O][^G][^_][^E][^N][^D][^\]].*$/" "cont"
如果相关,这是我正在使用的主要 Fluent Bit 配置:
[INPUT]
Name tail
Path /var/log/containers/*.log
# To get this working with Kubernetes, as per the following
# article, we essentially need two multiline detections, the second (custom)
# one of which must therefore happen in a FILTER:
# https://www.appsloveworld.com/springboot/100/10/can-not-get-to-work-fluentbit-multi-line-parser-in-k8s-env
multiline.parser cri
Tag kube.*
Buffer_Max_Size 1MB
Mem_Buf_Limit 10MB
Skip_Long_Lines Off
Read_from_Head True
Refresh_Interval 1
[FILTER]
name multiline
match *
multiline.parser multiline-my-log
multiline.key_content log
[OUTPUT]
Name file
Match kube.*
Path /var/lib/docker/containers
Format plain
它对我有用,使用您的负向前瞻正则表达式想法的变体(流畅位2.2.2)。基本上所有看起来不像开始的事情都应该是延续:
/^(?!\[MY_LOG_START\] ).*$/
解析器:
[MULTILINE_PARSER]
name multiline-my-log
key_content log
type regex
flush_timeout 2000
rule "start_state" "/^\[MY_LOG_START\] .*$/" "cont"
rule "cont" "/^(?!\[MY_LOG_START\] ).*$/" "cont"
配置:
[SERVICE]
Log_Level info
[INPUT]
Name tail
Tag test
Path /opt/homebrew/Cellar/fluent-bit/2.2.2/etc/fluent-bit/test.log
Read_from_Head On
multiline.parser cri
[FILTER]
Name multiline
Match *
multiline.parser multiline-my-log
multiline.key_content log
[OUTPUT]
name stdout
match *
示例日志“test.log”
2024-02-02T13:00:00.000000001Z stdout F [MY_LOG_START] 2023-03-17T12:34:52 INFO This is a single line log.[MY_LOG_END]
2024-02-02T13:00:00.000000002Z stdout F [MY_LOG_START] 2023-03-17T12:34:56 AUDIT This is a multi line log
2024-02-02T13:00:00.000000003Z stdout F From: some_app
2024-02-02T13:00:00.000000004Z stdout F To: some_other_app
2024-02-02T13:00:00.000000005Z stdout F Response: 200
2024-02-02T13:00:00.000000006Z stdout F Body: {
2024-02-02T13:00:00.000000007Z stdout F "hello": "world",
2024-02-02T13:00:00.000000008Z stdout F "goodbye": "moon"
2024-02-02T13:00:00.000000009Z stdout F }[MY_LOG_END]
2024-02-02T13:00:00.000000010Z stdout F [MY_LOG_START] 2023-03-17T12:34:52 WARN This is another single line log.[MY_LOG_END]
结果:
[2024/02/02 19:54:51] [ info] [fluent bit] version=2.2.2, commit=, pid=33432
[2024/02/02 19:54:51] [ info] [storage] ver=1.5.1, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/02/02 19:54:51] [ info] [cmetrics] version=0.6.6
[2024/02/02 19:54:51] [ info] [ctraces ] version=0.4.0
[2024/02/02 19:54:51] [ info] [input:tail:tail.0] initializing
[2024/02/02 19:54:51] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2024/02/02 19:54:51] [ info] [input:tail:tail.0] multiline core started
[2024/02/02 19:54:51] [ info] [filter:multiline:multiline.0] created emitter: emitter_for_multiline.0
[2024/02/02 19:54:51] [ info] [input:emitter:emitter_for_multiline.0] initializing
[2024/02/02 19:54:51] [ info] [input:emitter:emitter_for_multiline.0] storage_strategy='memory' (memory only)
[2024/02/02 19:54:51] [ info] [sp] stream processor started
[2024/02/02 19:54:51] [ info] [output:stdout:stdout.0] worker #0 started
[2024/02/02 19:54:51] [ info] [filter:multiline:multiline.0] created new multiline stream for tail.0_test
[0] test: [[1706878800.000000001, {}], {"time"=>"2024-02-02T13:00:00.000000001Z", "stream"=>"stdout", "_p"=>"F", "log"=>"[MY_LOG_START] 2023-03-17T12:34:52 INFO This is a single line log.[MY_LOG_END]"}]
[1] test: [[1706878800.000000002, {}], {"time"=>"2024-02-02T13:00:00.000000002Z", "stream"=>"stdout", "_p"=>"F", "log"=>"[MY_LOG_START] 2023-03-17T12:34:56 AUDIT This is a multi line log
From: some_app
To: some_other_app
Response: 200
Body: {
"hello": "world",
"goodbye": "moon"
}[MY_LOG_END]"}]
[0] test: [[1706878800.000000010, {}], {"time"=>"2024-02-02T13:00:00.000000010Z", "stream"=>"stdout", "_p"=>"F", "log"=>"[MY_LOG_START] 2023-03-17T12:34:52 WARN This is another single line log.[MY_LOG_END]"}]
请注意,我在示例数据中添加了 CRI 标头,以便能够在尾部输入配置中启用 multiline.parser cri,从而使测试更加真实。