我们一直在尝试为 k8s 集群上的日志聚合配置 fluent bit。我们正在使用 newRelic bundle helm charts 来实现这一点。 newRelic bundle 在每个 K8s 集群节点上创建一个 pod,并根据定义的配置处理日志。 除了堆栈跟踪串联之外,一切似乎都工作正常。 问题如下:
我们在单个节点上运行 4 个 pod,在运行时它们会在“/var/log/containers”目录下创建以下日志文件。
myapp-svc1-<pod-id>.log
myapp-svc2-<pod-id>.log
myapp-svc3-<pod-id>.log
myapp-svc4-<pod-id>.log
这是配置:
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level ${LOG_LEVEL}
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
[INPUT]
Name tail
Tag kube.*
Path ${PATH}
Parser ${LOG_PARSER}
DB ${FB_DB}
Mem_Buf_Limit 7MB
Skip_Long_Lines On
Refresh_Interval 10
[FILTER]
Name multiline
Match *
multiline.key_content log
multiline.parser multiline-regex-error-trace
[FILTER]
Name kubernetes
Match kube.*
# We need the full DNS suffix as Windows only supports resolving names with this suffix
# See: https://kubernetes.io/docs/setup/production-environment/windows/intro-windows-in-kubernetes/#dns-limitations
Kube_URL https://kubernetes.default.svc.cluster.local:443
Buffer_Size ${K8S_BUFFER_SIZE}
K8S-Logging.Exclude ${K8S_LOGGING_EXCLUDE}
[FILTER]
Name record_modifier
Match *
Record cluster_name ${CLUSTER_NAME}
Allowlist_key container_name
Allowlist_key namespace_name
Allowlist_key pod_name
Allowlist_key stream
Allowlist_key message
Allowlist_key log
Allowlist_key kubernetes
[OUTPUT]
Name newrelic
Match *
licenseKey ${LICENSE_KEY}
endpoint ${ENDPOINT}
lowDataMode ${LOW_DATA_MODE}
这是我们使用的解析器配置:
parsers.conf: |
[MULTILINE_PARSER]
name multiline-regex-error-trace
type regex
flush_timeout 1000
rule "start_state" "/([0-9]{2,4}\-[0-9]{1,2}\-[0-9]{1,2} [0-9]{1,2}\:[0-9]{1,2}\:[0-9]{1,2}\,[0-9]{2,4}) (.*)/" "stacktraceline2"
rule "stacktraceline2" "/^([a-z]{1,10})\.(.*)/" "stacktraceline3"
rule "stacktraceline3" "/^\s+at.*/" "stacktraceline3"
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
[PARSER]
Name cri
Format regex
Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<message>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
如果我们在“INPUT”配置下为“Path”键提及“/var/log/containers/*.log”,我们将指示 fluenbit 从该目录收集所有日志,并进行处理,感谢上帝的恩典,它可以正常工作完美。 但是,在这种情况下,自定义多行解析器“multiline-regex-error-trace”似乎不适用于所有 pod 日志。 日志仅针对其中一个 pod 进行连接,我们看到每个堆栈跟踪行都是针对所有剩余的 pod 单独推送的。
为了使这个解析器适用于特定的 pod,我们需要将路径定义为“/var/log/containers/myapp-svc1-.log”
或“/var/log/containers/myapp-svc2-.log”,即取决于 pod 名称。
但这不是所需的配置,因为定义 pod 特定路径将限制仅针对节点上该 pod 的日志收集,我们需要所有 pod 的日志。
这是日志示例文件:
2022-09-02 18:46:53,206 ERROR 5d9073b1-9f90-42c1-b7a2-d5a6c13f2669 [http-nio-9002-exec-3] i.f.m.c.c.ContentController: Exception recevied from the service
java.lang.Exception: Custom error
at myapp.content.controller.ContentController.throwError(ContentController.java:46)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:197)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:141)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:106)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:894)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:808)
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1060)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:962)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006)
at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:626)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:733)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:227)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at myapp.content.filter.RequestIdAddingFilter.doFilterInternal(RequestIdAddingFilter.java:51)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter.doFilterInternal(WebMvcMetricsFilter.java:93)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:202)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:97)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:542)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:143)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:78)
at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:764)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:346)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:374)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:887)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1684)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.base/java.lang.Thread.run(Unknown Source)
我们尝试使用多行内置的 fluentbit 解析器“java”。但它还有另一个问题。 我们的堆栈跟踪中有 3 种类型的日志行。
**Line Type 1:** 2022-09-02 18:46:53,206 ERROR 5d9073b1-9f90-42c1-b7a2-d5a6c13f2669 [http-nio-9002-exec-3] i.f.m.c.c.ContentController: Exception recevied from the service
**Line Type 2:** java.lang.Exception: Custom error
**Line Type 3:**
at myapp.content.controller.ContentController.throwError(ContentController.java:46)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
内置的“java”解析器似乎只结合了最后两种类型“Line Type 2”和“Line Type 3”,它排除了第一行。 newrelic bundle 内部使用 fluent bit 1.9.4,这是今天的最新版本。 要重现此问题,只需在 k8s 节点上运行 2 个 pod,然后查看堆栈跟踪是否为每个具有 fluent bit 1.9.4 的 pod 串联起来。 确保日志样本如上所述。 有人可以帮我们解决这个问题吗,过去 7 天我一直在用头撞墙?
你的问题解决了吗? 我有类似的问题并将流利位版本更新到最新
是不是放错区了? 在docs中,他们将
multiline.parser
添加到INPUT
而不是[FILTER]
部分。有点像
[INPUT]
Name tail
Tag kube.*
Path ${PATH}
Parser ${LOG_PARSER}
DB ${FB_DB}
Mem_Buf_Limit 7MB
Skip_Long_Lines On
Refresh_Interval 10
multiline.parser multiline-regex-error-trace