如何使用 Filebeat 剖析具有多种模式的日志文件?

问题描述 投票:0回答:1

我无法剖析我的日志文件,因为它具有混合结构,因此我无法提取有意义的数据。

我的日志中的一些示例行:

2021.04.21 00:00:00.843  INF  getBaseData: UserName = 'some username', Password = 'some pass', HTTPS=0
2021.04.21 00:00:00.843  INF  getBaseData: UserName = 'some username', Password = 'some pass', HTTPS=0
2021.04.21 00:00:00.843  INF  getBaseData: UserName = 'some username', Password = 'some pass', HTTPS=0
2021.04.21 00:00:00.858  INF  *** BEGINNING OF ARCCore.performARCTask ***
2021.04.21 00:00:00.858  INF  *** BEGINNING OF ARCCore.ProcessTask ***
2021.04.21 00:00:01.266  INF  ARCCore.DCI4ARCSyncLogin: login successfully executed. - No error - DCI4ARCSync-CurrSessions/MaxSessions=17/400 CurrProcesses/MaxProcesses=16/250
2021.04.21 00:00:01.297  INF  ARCCore.DCI4ARCSyncLogin: login successfully executed. - No error - DCI4ARCSync-CurrSessions/MaxSessions=7/400 CurrProcesses/MaxProcesses=7/250
2021.04.21 00:00:08.165  INF  ***  BEGINNING OF SYNC ARC TO DC  ***--->bIsExternal:0
2021.04.21 00:00:08.434  INF  BOC login successfully executed.  - No Error - DCI4ARC-CurrSessions/MaxSessions=24/400 CurrProcesses/MaxProcesses=15/250
2021.04.21 00:00:08.635  INF  BOCVersionNr ==> V16.1.00.00
2021.04.21 00:00:08.804  INF  setEntitySnapshot successfully executed
2021.04.21 00:00:09.453  INF  getSnapshotList successfully executed
2021.04.21 00:00:09.461  INF  getARCVersion: ARCVersionNr ==> V16.0.00.06

我编写了一个分词器,用它成功地剖析了日志的前三行,因为它们与模式匹配,但无法读取其余部分。

我的标记器模式:

%{+timestamp} %{+timestamp}  %{type}  %{msg}: UserName = %{userName}, Password = %{password}, HTTPS=%{https}

读取成功的行:

2021.04.21 00:00:00.843 INF getBaseData: 用户名 = '某个用户名', 密码 = '某个密码', HTTPS=0 2021.04.21 00:00:00.843 INF getBaseData: 用户名 = '某个用户名', 密码 = '某个密码', HTTPS=0 2021.04.21 00:00:00.843 INF getBaseData: 用户名 = '某个用户名', 密码 = '某个密码', HTTPS=0

结果:

{
  "https": "0",
  "msg": "getBaseData",
  "password": "'20213197'",
  "timestamp": "2021.04.21 00:00:00.843",
  "type": "INF",
  "userName": "'ARC_412_028_01_V01_2021042100000082'"
}
{
  "https": "0",
  "msg": "getBaseData",
  "password": "'20213205'",
  "timestamp": "2021.04.21 00:00:00.843",
  "type": "INF",
  "userName": "'ARC_412_028_01_V01_2021042100000084'"
}
{
  "https": "0",
  "msg": "getBaseData",
  "password": "'20213205'",
  "timestamp": "2021.04.21 00:00:00.843",
  "type": "INF",
  "userName": "'ARC_412_028_01_V01_2021042100000084'"
}
elasticsearch logging elastic-stack filebeat elk
1个回答
0
投票

我建议如下。您可以定义更多剖析模式,但如果没有任何匹配,至少日志会通过基本字段。根据用例,可能不需要ignore_failure和overwrite_keys。

filebeat.inputs:
  - type: filestream
    processors:
      - dissect:
          tokenizer: '%{+timestamp} %{+timestamp}  %{type}  %{content}'
          field: message
          target_prefix: ""
          trim_values: left
      - dissect:
          when:
            regexp:
              content: '^getBaseData: .*'
          tokenizer: '%{msg}: UserName = %{userName}, Password = %{password}, HTTPS=%{https}'
          field: content
          target_prefix: ""
          ignore_failure: true
          overwrite_keys: true
processors:
  - drop_fields:
      fields: ["content"]

我还发现了一些测试正则表达式和 filebeat 剖析模式的网站: https://regex101.com/r/FGheKd/1

https://dissect-tester.jorgelbg.me/

© www.soinside.com 2019 - 2024. All rights reserved.