Fluent-bit:从单个日志文件中提取两行类别到两个输出

问题描述 投票:0回答:0

总体目标

我有一个巨大的特定于应用程序的日志文件,很容易逐行解析,有两种类型的日志行我想用 fluent-bit 尾部和提取以便在时间序列数据库/弹性/等中进一步处理

  1. 第一类兴趣是“RECEIVED”行,表示传入请求:
    2023-03-25 00:33:17 <  43084:X@540435747125056> RECEIVED /req:10.45.3.24(10.45.3.24):user9458:service8457:
  1. 第二类兴趣是“DURATION”行,表示我们已回复请求:
    2023-03-25 00:33:17 <  43084:X@540435747125056> DURATION TO RESPONSE  15.178 ms::user9458:service8457:

像下面这样的大部分行在此时对进一步处理没有兴趣,应被省略:

    2023-03-25 00:33:17 <  43084:X@540435747125056> <X:   > request_details_of_no_interest

预期效果

一开始,我只想将输出转储到两个单独的文件中,并期望两个示例行的结果如下:

  1. 传入:[1679700797.000000000, {"ip":"10.45.3.24","user":"user9458","service":"service8457"}]

  2. 完成:[1679700797.000000000,{“duration_ms”:“15.178”,“用户”:“user9458”,“服务”:“service8457”}]

理想情况下,我们通过使用自己的和更简单的正则表达式处理两种(或更多)线类型,以堆叠的方式实现这一点。 因此,如果可能的话,不要使用“更复杂”的正则表达式,它会同时匹配两种线型,同时提供所有匹配组作为键。

附加要求 通过将尾日志文件 /opt/app/logs/http.log 移动到子文件夹 /opt/app/logs/archived 然后 gzip 压缩它,同时生成新的 http.log,每天轮换。

我目前失败的状态

到目前为止,我要么根据下面的解析器“my_basic”以时间戳+日志消息的形式输出到两个输出文件中,要么我在类别 1 中成功,但类别 2 根本没有输出:-(

请找到我的 patterns.conf:

[PARSER]
    Name    my_basic
    Format  regex
    Regex   ^(?<time>\d+-\d+-\d+ \d+:\d+:\d+)\ (?<message>.*)$
    Time_Key    time
    Time_Format %Y-%m-%d %H:%M:%S

[PARSER]
    Name    my_incoming
    Format  regex
    Regex   ^(?<time>\d+-\d+-\d+ \d+:\d+:\d+)\ .*\((?<ip>\d+\.\d+\.\d+\.\d+)\):(?<user>[^:]+):(?<service>[^:]+):.*$
    Time_Key    time
    Time_Format %Y-%m-%d %H:%M:%S

[PARSER]
    Name    my_done
    Format  regex
    Regex   ^(?<time>\d+-\d+-\d+\ \d+:\d+:\d+)\ .*DURATION\ TO\ RESPONSE\s+(?<duration>[\d\.]+)\ ms:.*:(?<user>[^:]+):(?<service>[^:]+):.*$
    Time_Key    time
    Time_Format %Y-%m-%d %H:%M:%S

和fluent-bit.conf的相关部分:

[INPUT]
    name tail
    path /opt/app/logs/http.log*
    Tag  node1.market1.app
    #Parser my_basic
    Parser my_incoming
    DB /opt/app/logs/file_status.db

    # Read interval (sec) Default: 1
    Refresh_Interval 10

    # I would like to know if such a grep section would help in terms of performance or consuming less resources, but currently I have more basic problems ...
    #[FILTER]
    #    name grep
    #    match *
    #    regex message RECEIVED|DURATION\ TO\ RESPONSE

[FILTER]
    Name rewrite_tag
    Match *.app
    #Rule $message RECEIVED my_incoming true
    Rule $ip \d+\.\d+\.\d+\.\d+ my_incoming true
    Emitter_Name my_incoming

    # Evidence of one of my dozen failed attempts so far
    #[FILTER]
    #    Name Parser
    #    match my_incoming
    #    Key_Name message
    #    Parser my_incoming

[FILTER]
    Name rewrite_tag
    match *.app
    Rule $ip ^$ my_done false
    Emitter_Name my_done

[FILTER]
    Name Parser
    match my_done
    Key_Name duration
    Parser my_done

[OUTPUT]
    name  file
    match my_incoming
    path /opt/app/logs/extracted_incoming_reqs

[OUTPUT]
    name  file
    match my_done
    path /opt/app/logs/extracted_incoming_done
filtering fluent-bit fluent-bit-rewrite-tag
© www.soinside.com 2019 - 2024. All rights reserved.