我无法剖析我的日志文件,因为它具有混合结构,因此我无法提取有意义的数据。
我的日志中的一些示例行:
2021.04.21 00:00:00.843 INF getBaseData: UserName = 'some username', Password = 'some pass', HTTPS=0
2021.04.21 00:00:00.843 INF getBaseData: UserName = 'some username', Password = 'some pass', HTTPS=0
2021.04.21 00:00:00.843 INF getBaseData: UserName = 'some username', Password = 'some pass', HTTPS=0
2021.04.21 00:00:00.858 INF *** BEGINNING OF ARCCore.performARCTask ***
2021.04.21 00:00:00.858 INF *** BEGINNING OF ARCCore.ProcessTask ***
2021.04.21 00:00:01.266 INF ARCCore.DCI4ARCSyncLogin: login successfully executed. - No error - DCI4ARCSync-CurrSessions/MaxSessions=17/400 CurrProcesses/MaxProcesses=16/250
2021.04.21 00:00:01.297 INF ARCCore.DCI4ARCSyncLogin: login successfully executed. - No error - DCI4ARCSync-CurrSessions/MaxSessions=7/400 CurrProcesses/MaxProcesses=7/250
2021.04.21 00:00:08.165 INF *** BEGINNING OF SYNC ARC TO DC ***--->bIsExternal:0
2021.04.21 00:00:08.434 INF BOC login successfully executed. - No Error - DCI4ARC-CurrSessions/MaxSessions=24/400 CurrProcesses/MaxProcesses=15/250
2021.04.21 00:00:08.635 INF BOCVersionNr ==> V16.1.00.00
2021.04.21 00:00:08.804 INF setEntitySnapshot successfully executed
2021.04.21 00:00:09.453 INF getSnapshotList successfully executed
2021.04.21 00:00:09.461 INF getARCVersion: ARCVersionNr ==> V16.0.00.06
我编写了一个分词器,用它成功地剖析了日志的前三行,因为它们与模式匹配,但无法读取其余部分。
我的标记器模式:
%{+timestamp} %{+timestamp} %{type} %{msg}: UserName = %{userName}, Password = %{password}, HTTPS=%{https}
读取成功的行:
2021.04.21 00:00:00.843 INF getBaseData: 用户名 = '某个用户名', 密码 = '某个密码', HTTPS=0 2021.04.21 00:00:00.843 INF getBaseData: 用户名 = '某个用户名', 密码 = '某个密码', HTTPS=0 2021.04.21 00:00:00.843 INF getBaseData: 用户名 = '某个用户名', 密码 = '某个密码', HTTPS=0
结果:
{
"https": "0",
"msg": "getBaseData",
"password": "'20213197'",
"timestamp": "2021.04.21 00:00:00.843",
"type": "INF",
"userName": "'ARC_412_028_01_V01_2021042100000082'"
}
{
"https": "0",
"msg": "getBaseData",
"password": "'20213205'",
"timestamp": "2021.04.21 00:00:00.843",
"type": "INF",
"userName": "'ARC_412_028_01_V01_2021042100000084'"
}
{
"https": "0",
"msg": "getBaseData",
"password": "'20213205'",
"timestamp": "2021.04.21 00:00:00.843",
"type": "INF",
"userName": "'ARC_412_028_01_V01_2021042100000084'"
}
我建议如下。您可以定义更多剖析模式,但如果没有任何匹配,至少日志会通过基本字段。根据用例,可能不需要ignore_failure和overwrite_keys。
filebeat.inputs:
- type: filestream
processors:
- dissect:
tokenizer: '%{+timestamp} %{+timestamp} %{type} %{content}'
field: message
target_prefix: ""
trim_values: left
- dissect:
when:
regexp:
content: '^getBaseData: .*'
tokenizer: '%{msg}: UserName = %{userName}, Password = %{password}, HTTPS=%{https}'
field: content
target_prefix: ""
ignore_failure: true
overwrite_keys: true
processors:
- drop_fields:
fields: ["content"]
我还发现了一些测试正则表达式和 filebeat 剖析模式的网站: https://regex101.com/r/FGheKd/1