AWS Glue SerDe 分类器似乎很贪婪

问题描述 投票:0回答:1

我的日志文件如下所示:

2023-08-12T13:46:54.577Z (2a25ec26-6b85-49cd-b9c5-b5455d24f71e) Usage Plan check succeeded for API Key *************************4Y01JJ and API Stage kbyw1pi8yl/production
2023-08-12T13:46:54.577Z (2a25ec26-6b85-49cd-b9c5-b5455d24f71e) Starting execution for request: 2a25ec26-6b85-49cd-b9c5-b5455d24f71e
2023-08-12T13:46:54.577Z (2a25ec26-6b85-49cd-b9c5-b5455d24f71e) HTTP Method: POST, Resource Path: /
2023-08-12T13:46:54.578Z (2a25ec26-6b85-49cd-b9c5-b5455d24f71e) API Key: *************************4Y01JJ
2023-08-12T13:46:54.578Z (2a25ec26-6b85-49cd-b9c5-b5455d24f71e) API Key ID: null
2023-08-12T13:46:54.578Z (2a25ec26-6b85-49cd-b9c5-b5455d24f71e) Method request path: {}
2023-08-12T13:46:54.578Z (2a25ec26-6b85-49cd-b9c5-b5455d24f71e) Method request query string: {}
2023-08-12T13:46:54.578Z (2a25ec26-6b85-49cd-b9c5-b5455d24f71e) Method request headers: {tracestate=3144164@nr=0-0-3144164-529450240-3bd89e21623c79ef-330955f8b18b434b-0-0.796626-1691848014574,, x-api-key=*************************4Y01JJ, traceparent=00-a9fc5401ab9b196028a76ca4bc4a87f9-3bd89e21623c79ef-00, X-Forwarded-Proto=https, X-Forwarded-For=13.55.73.250, Host=rtdemo.oztam.com.au, X-Forwarded-Port=443, newrelic=eyJ2IjpbMCwxXSwiZCI6eyJ0eSI6IkFwcCIsImFjIjoiMzE0NDE2NCIsImFwIjoiNTI5NDUwMjQwIiwidHIiOiJhOWZjNTQwMWFiOWIxOTYwMjhhNzZjYTRiYzRhODdmOSIsInByIjowLjc5NjYyNiwic2EiOmZhbHNlLCJ0aSI6MTY5MTg0ODAxNDU3NCwidHgiOiIzMzA5NTVmOGIxOGI0MzRiIiwiaWQiOiIzYmQ4OWUyMTYyM2M3OWVmIn19, X-Amzn-Trace-Id=Root=1-64d78d4e-137c6fce1951101a42bbf380, Content-Type=application/json; charset=utf-8}
2023-08-12T13:46:54.578Z (2a25ec26-6b85-49cd-b9c5-b5455d24f71e) Method request body before transformations: {"publisherId":"f5bb44c-5ad5-4c92-b989-7d17f48e131f","publisherName":"Seven","timestamp":"2023-08-12T13:46:54.5741539+00:00","mediaId":"FWWC23-012","demo1":"7d92f21eae0c4147a12d55cee697a4b5","deviceType":"tv","mediaType":"vod","deviceId":"e96b7203-0295-ff57-6f6d-f3b6031d525d","sessionId":"294650dd-f365-4412-dc7a-9da40ab0d74e","ipAddress":"115.64.23.121"}
2023-08-12T13:46:54.578Z (2a25ec26-6b85-49cd-b9c5-b5455d24f71e) Endpoint request URI: https://lambda.ap-southeast-2.amazonaws.com/2015-03-31/functions/arn:aws:lambda:ap-southeast-2:408865984143:function:oztam-realtime-demo/invocations

我正在使用 GROK 分类器,如下所示:

%{TIMESTAMP_ISO8601:timestamp} \(%{UUID:session_id}\) %{MSGTYPE:msg_type}%{GREEDYDATA:syslog_message}

MSGTYPE (?:(.*?):)?\s?

目标是捕获 4 个字段:

  • 时间戳
  • 会话ID
  • 消息类型
  • 其余消息

msg_type 应该是第一个 : 之前的任何文本(在 ts、sess 之后捕获良好)。我的麻烦在于 MSGTYPE 正则表达式,它在正则表达式编辑器中工作得很好,但在分类器中似乎总是非常贪婪 - 它一直匹配到最后一个: - 我尝试了很多变体。

还要注意,有些行没有分号 - 因此没有 msgtype,因此在这些情况下它应该为空。

这适用于 grokconstructor.appshot.com 中的示例数据

还有其他人看到过胶水/分类器中的这种贪婪吗?或者建议。

tx

amazon-web-services classification grok
1个回答
0
投票

在与 AWS 进行一些反复之后,爬虫程序和/或分类器似乎将模式缓存在某处。目前的解决方案是完全删除并重新创建爬虫和分类器(这对我有用)。也就是说,仅删除/重新创建这些元素之一就足够了。

© www.soinside.com 2019 - 2024. All rights reserved.