使用多行日志开始和结束标签时的 Fluent Bit 规则

问题描述 投票:0回答:1

我正在尝试设置 Fluent Bit 以从 Kubernetes/containerd 获取日志并将其发送到 Splunk。我们的应用程序创建的日志均以固定开始标记开始并以固定结束标记结束([MY_LOG_START] 和 [MY_LOG_END]);这在我们所有的众多服务中都是一致的,实际上无法改变。这是我们的日志的示例:

[MY_LOG_START] 2023-03-17T12:34:52 INFO This is a single line log.[MY_LOG_END]
[MY_LOG_START] 2023-03-17T12:34:56 AUDIT This is a multi line log
From: some_app
To: some_other_app
Response: 200
Body: {
  "hello": "world",
  "goodbye": "moon"
}[MY_LOG_END]
[MY_LOG_START] 2023-03-17T12:34:52 WARN This is another single line log.[MY_LOG_END]

虽然使用正则表达式规则进行多行解析,但我不知道如何做到这一点。这是我的解析器配置,其中包含我采取的几种不同方法以及对每种方法的评论。

[MULTILINE_PARSER]
    name multiline-my-log
    type regex
    flush_timeout 2000

    # rules |   state name  | regex pattern                  | next state
    # ------|---------------|--------------------------------------------
    rule      "start_state"   "/^\[MY_LOG_START\].*$/"         "cont"

    # This is the closest I got and works fine as long as the log is exactly
    # three lines long. The problem is, there's no way I can see to repeat
    # the first rule until the second one is matched.
    rule      "cont"          "/^.*$/"                         "end_state"
    rule      "end_state"     "/^.*\[MY_LOG_END\]$/"           "end_state"

    # I tried using negative lookahead, but this has the problem that a)
    # the wildcard before it seems to just swallow up the whole line and
    # make the match not work and b), the final line with the end tag
    # would not be picked up as it would not match.
    #rule      "cont"          "/^.*(?!\[MY_LOG_END\].*)$/"    "cont"

    # I also tried this approach, which did not work, it does not combine
    # any of the multiline logs.
    #rule      "cont"          "/^.*[^\[][^M][^Y][^_][^L][^O][^G][^_][^E][^N][^D][^\]].*$/"  "cont"

如果相关,这是我正在使用的主要 Fluent Bit 配置:

[INPUT]
    Name tail
    Path /var/log/containers/*.log
    # To get this working with Kubernetes, as per the following
    # article, we essentially need two multiline detections, the second (custom)
    # one of which must therefore happen in a FILTER:
    # https://www.appsloveworld.com/springboot/100/10/can-not-get-to-work-fluentbit-multi-line-parser-in-k8s-env
    multiline.parser cri
    Tag kube.*
    Buffer_Max_Size 1MB
    Mem_Buf_Limit 10MB
    Skip_Long_Lines Off
    Read_from_Head True
    Refresh_Interval 1
[FILTER]
    name                  multiline
    match                 *
    multiline.parser      multiline-my-log
    multiline.key_content log
[OUTPUT]
    Name file
    Match kube.*
    Path /var/lib/docker/containers
    Format plain
logging multiline fluent-bit
1个回答
0
投票

它对我有用,使用您的负向前瞻正则表达式想法的变体(流畅位2.2.2)。基本上所有看起来不像开始的事情都应该是延续:

/^(?!\[MY_LOG_START\] ).*$/

解析器:

[MULTILINE_PARSER]
    name          multiline-my-log
    key_content   log
    type          regex
    flush_timeout 2000
    rule     "start_state"   "/^\[MY_LOG_START\] .*$/"          "cont"
    rule     "cont"          "/^(?!\[MY_LOG_START\] ).*$/"      "cont"

配置:

[SERVICE]
    Log_Level         info

[INPUT]
    Name              tail
    Tag               test
    Path              /opt/homebrew/Cellar/fluent-bit/2.2.2/etc/fluent-bit/test.log
    Read_from_Head    On
    multiline.parser  cri

[FILTER]
    Name                  multiline
    Match                 *
    multiline.parser      multiline-my-log
    multiline.key_content log

[OUTPUT]
    name  stdout
    match *

示例日志“test.log”

2024-02-02T13:00:00.000000001Z stdout F [MY_LOG_START] 2023-03-17T12:34:52 INFO This is a single line log.[MY_LOG_END]
2024-02-02T13:00:00.000000002Z stdout F [MY_LOG_START] 2023-03-17T12:34:56 AUDIT This is a multi line log
2024-02-02T13:00:00.000000003Z stdout F From: some_app
2024-02-02T13:00:00.000000004Z stdout F To: some_other_app
2024-02-02T13:00:00.000000005Z stdout F Response: 200
2024-02-02T13:00:00.000000006Z stdout F Body: {
2024-02-02T13:00:00.000000007Z stdout F   "hello": "world",
2024-02-02T13:00:00.000000008Z stdout F   "goodbye": "moon"
2024-02-02T13:00:00.000000009Z stdout F }[MY_LOG_END]
2024-02-02T13:00:00.000000010Z stdout F [MY_LOG_START] 2023-03-17T12:34:52 WARN This is another single line log.[MY_LOG_END]

结果:

[2024/02/02 19:54:51] [ info] [fluent bit] version=2.2.2, commit=, pid=33432
[2024/02/02 19:54:51] [ info] [storage] ver=1.5.1, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/02/02 19:54:51] [ info] [cmetrics] version=0.6.6
[2024/02/02 19:54:51] [ info] [ctraces ] version=0.4.0
[2024/02/02 19:54:51] [ info] [input:tail:tail.0] initializing
[2024/02/02 19:54:51] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2024/02/02 19:54:51] [ info] [input:tail:tail.0] multiline core started
[2024/02/02 19:54:51] [ info] [filter:multiline:multiline.0] created emitter: emitter_for_multiline.0
[2024/02/02 19:54:51] [ info] [input:emitter:emitter_for_multiline.0] initializing
[2024/02/02 19:54:51] [ info] [input:emitter:emitter_for_multiline.0] storage_strategy='memory' (memory only)
[2024/02/02 19:54:51] [ info] [sp] stream processor started
[2024/02/02 19:54:51] [ info] [output:stdout:stdout.0] worker #0 started
[2024/02/02 19:54:51] [ info] [filter:multiline:multiline.0] created new multiline stream for tail.0_test
[0] test: [[1706878800.000000001, {}], {"time"=>"2024-02-02T13:00:00.000000001Z", "stream"=>"stdout", "_p"=>"F", "log"=>"[MY_LOG_START] 2023-03-17T12:34:52 INFO This is a single line log.[MY_LOG_END]"}]
[1] test: [[1706878800.000000002, {}], {"time"=>"2024-02-02T13:00:00.000000002Z", "stream"=>"stdout", "_p"=>"F", "log"=>"[MY_LOG_START] 2023-03-17T12:34:56 AUDIT This is a multi line log
From: some_app
To: some_other_app
Response: 200
Body: {
  "hello": "world",
  "goodbye": "moon"
}[MY_LOG_END]"}]
[0] test: [[1706878800.000000010, {}], {"time"=>"2024-02-02T13:00:00.000000010Z", "stream"=>"stdout", "_p"=>"F", "log"=>"[MY_LOG_START] 2023-03-17T12:34:52 WARN This is another single line log.[MY_LOG_END]"}]

请注意,我在示例数据中添加了 CRI 标头,以便能够在尾部输入配置中启用 multiline.parser cri,从而使测试更加真实。

© www.soinside.com 2019 - 2024. All rights reserved.