如何配置 FluentBit 和 OpenSearch 以便正确处理 json 和非 json 日志

Question

我们有多个应用程序在 Kubernetes 下运行，用 Python、Go、Ruby 和 Elixer 编写。我们使用 Fluent Bit 将所有日志转发到 AWS Open Search。我们所有的组件都将日志写入 STDOUT/STDERR。有些组件以 JSON 格式编写，有些组件以非 JSON 文本格式编写。在 Open Search UI 中，JSON 日志条目的完整正文不会解析为各个字段，我们看到它包含一些元数据字段，后跟一个长 json 字符串。这是一个例子：

这是从 OpenSearch UI 复制的

log

字段的完整内容

2023-01-09T23:41:56.279212506Z stdout F {"level":"WARN","ts":1673307716278.9448,"caller":"internal/internal_task_pollers.go:348","message":"Failed to process workflow task.","Namespace":"ai-platform-dev.59ee7","TaskQueue":"WORKFLOW_QUEUE","WorkerID":"1@workflow-worker-ai-workflow-worker-6c445f59f7-pgn6v@","WorkflowType":"NotesProWorkflow","WorkflowID":"workflow_1169649613530771459_1664751006481316721","RunID":"1ae58130-62d6-4f6a-a6db-8789be13d567","Attempt":12530,"Error":"lookup failed for scheduledEventID to activityID: scheduleEventID: 36, activityID: 36"}

请注意，上面的

log

字段提取在嵌入的 json 字符串开始之前有一些内部“字段”，我的意思是这部分

2023-01-09T23:41:56.279212506Z stdout F

我开始怀疑，也许

log

字段的非 JSON 开头会导致

es

流利位输出插件无法解析/解码 json 内容，然后

es

插件不会提供OpenSearch 的 json 中的子字段。

我正在考虑使用流畅位正则表达式解析器仅提取日志字符串的内部 json 组件，我假设该组件随后会被解析为 json 并作为单独的字段转发到 OpenSearch。

我将尝试这个解析器配置，使用正则表达式将日志字符串的 json 部分提取到一个名为 captureJson 的新字段中，然后将该字段解码为 json （想法来自 https://stackoverflow.com/a/66852383 /833960）：

[PARSER]
     Format regex
     Name logging-parser
     Regex ^(?<timestamp>.*) (?<stream>.*) .* (?<capturedJson>{.*})$
     Decode_Field json capturedJson
     Time_Format %FT%H:%M:%S,%L
     Time_Key time

以非 JSON 格式登录的组件在 OpenSearch 中看起来很好。

如何配置 FluentBit 和 OpenSearch 以使我的 json 和非 json 组件在 OpenSearch 中正确呈现？

这是所有组件共享的当前 FluentBit 配置文件：

{
    "fluent-bit.conf": "[SERVICE]
            Parsers_File /fluent-bit/parsers/parsers.conf
        
        [INPUT]
            Name              tail
            Tag               kube.*
            Path              /var/log/containers/*.log
            DB                /var/log/flb_kube.db
            Parser            docker
            Docker_Mode       On
            Mem_Buf_Limit     5MB
            Skip_Long_Lines   On
            Refresh_Interval  10
        
        [FILTER]
            Name                kubernetes
            Match               kube.*
            Kube_URL            https://kubernetes.default.svc.cluster.local:443
            Merge_Log           On
            Merge_Log_Key       data
            Keep_Log            On
            K8S-Logging.Parser  On
            K8S-Logging.Exclude On
            Buffer_Size         32k
        [OUTPUT]
            Name            es
            Match           *
            AWS_Region      us-west-2
            AWS_Auth        On
            Host            opensearch.my-domain.com
            Port            443
            TLS             On
            Retry_Limit     6
            Replace_Dots    On
            Index my-index-name
            AWS_STS_Endpoint https://sts.us-west-2.amazonaws.com
        "
}

这是摘自

parsers.conf

bash-4.2# cat parsers.conf 
...
[PARSER]
    Name   json
    Format json
    Time_Key time
    Time_Format %d/%b/%Y:%H:%M:%S %z

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On
    # --
    # Since Fluent Bit v1.2, if you are parsing Docker logs and using
    # the Kubernetes filter, it's not longer required to decode the
    # 'log' key.
    #
    # Command      |  Decoder | Field | Optional Action
    # =============|==================|=================
    #Decode_Field_As    json     log
...

在 OpenSearch 中，我在名为

log

的字段中看到完整的日志有效负载，它被定义为

如果我在 Elastic 中获取索引并查找

log

字段，我会看到：

GET my-index-name
{
}

...
        "log" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
...

我应该将

log

字段的类型修改为 dynamic 吗？我还需要更改 FluentBit 配置中的任何内容吗？

即使我的组件通常以 json 形式记录，有时仍然会向 STDERR 发出非 json 格式的输出，例如，如果发生了某些绕过应用程序日志处理的错误情况。这也能处理吗？

我们正在使用：

FluentBit 1.8.x
开放搜索1.3

我认为这与我的问题相关：https://github.com/microsoft/ Fluentbit-containerd-cri-o-json-log/blob/main/config.yaml

Answer 1

请按照以下说明使用 CRI 解析器官方文档

[PARSER]
    # http://rubular.com/r/tjUt3Awgg4
    Name cri
    Format regex
    Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<message>.*)$
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L%z

[PARSER]
    Name   json
    Format json

然后您应该利用 Fluentbit Filters 在输入上配置额外的解析器

[Filter]
    Name Parser
    Match *
    Parser cri
    Key_Name message

[Filter]
    Name Parser
    Match *
    Parser json
    Key_Name log

Answer 2

在我的例子中，我通过 fluence-bit 将日志从 AWS ECS 发送到 AWS OpenSearch，我遇到了与我的日志作为字符串发送相同的问题。我不知道如何使用我的案例的解析器，我正在我的 ecs 任务定义中发送日志。

请帮助我解决我的情况？，没有适当的文档。

如何配置 FluentBit 和 OpenSearch 以便正确处理 json 和非 json 日志

问题描述投票：0回答：2

2个回答

最新问题

如何配置 FluentBit 和 OpenSearch 以便正确处理 json 和非 json 日志

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2