我计划在 kubernetes 上通过 Gunicorn 运行 Flask。为了正确记录日志,我想以 json 格式输出所有日志。
目前我正在使用 minikube 和 https://github.com/inovex/kubernetes-logging 进行测试,以流畅地收集日志。
我设法使错误日志(回溯)格式正确,这要归功于: 使用 Flask 和 Gunicorn 进行 JSON 格式的日志记录
我仍在为访问日志格式而苦苦挣扎。 我指定了以下gunicorn访问日志格式:
access_log_format = '{"remote_ip":"%(h)s","request_id":"%({X-Request-Id}i)s","response_code":"%(s)s","request_method":"%(m)s","request_path":"%(U)s","request_querystring":"%(q)s","request_timetaken":"%(D)s","response_length":"%(B)s"}'
生成的日志是json格式的。但是消息部分(基于 access_log_format 的格式)现在包含转义双引号,并且不会被 Fluentd / ELK 解析到其单独的字段中
{"tags": [], "timestamp": "2017-12-07T11:50:20.362559Z", "level": "INFO", "host": "ubuntu", "path": "/usr/local/lib/python2.7/dist-packages/gunicorn/glogging.py", "message": "{\"remote_ip\":\"127.0.0.1\",\"request_id\":\"-\",\"response_code\":\"200\",\"request_method\":\"GET\",\"request_path\":\"/v1/records\",\"request_querystring\":\"\",\"request_timetaken\":\"19040\",\"response_length\":\"20\"}", "logger": "gunicorn.access"}
谢谢 Jpw
最简单的解决方案是将外部单引号更改为双引号,内部双引号更改为单引号,如下所述。
--access-logformat "{'remote_ip':'%(h)s','request_id':'%({X-Request-Id}i)s','response_code':'%(s)s','request_method':'%(m)s','request_path':'%(U)s','request_querystring':'%(q)s','request_timetaken':'%(D)s','response_length':'%(B)s'}"
以下是示例日志
{'remote_ip':'127.0.0.1','request_id':'-','response_code':'404','request_method':'GET','request_path':'/test','request_querystring':'','request_timetaken':'6642','response_length':'233'}
{'remote_ip':'127.0.0.1','request_id':'-','response_code':'200','request_method':'GET','request_path':'/','request_querystring':'','request_timetaken':'881','response_length':'20'}
我正在寻找在日志配置文件中有用的东西。另外,我不喜欢手动构建 json 格式。
解决方案:
Gunicorn 的所有“日志记录参数”都可以在记录的参数中找到。因此,让我们从那里获取字段,并让 pythonjsonlogger 为我们完成其余的工作。
格式化类
from pythonjsonlogger.jsonlogger import JsonFormatter, merge_record_extra
class GunicornLogFormatter(JsonFormatter):
def add_fields(self, log_record, record, message_dict):
"""
This method allows us to inject gunicorn's args as fields for the formatter
"""
super(GunicornLogFormatter, self).add_fields(log_record, record, message_dict)
for field in self._required_fields:
if field in self.rename_fields:
log_record[self.rename_fields[field]] = record.args.get(field)
else:
log_record[field] = record.args.get(field)
示例日志记录配置文件
{
"version": 1,
"disable_existing_loggers": false,
"formatters": {
"gunicorn_json": {
"()": "GunicornLogFormatter",
"format": "%(h)s %(r)s %({x-request-id}i)s",
"datefmt": "%Y-%m-%dT%H:%M:%S%z",
"rename_fields": {
"{x-request-id}i": "request_id",
"r": "request"
}
}
},
"handlers": {
"json-gunicorn-console": {
"class": "logging.StreamHandler",
"level": "INFO",
"formatter": "gunicorn_json",
"stream": "ext://sys.stdout"
}
},
"loggers": {
"gunicorn.access": {
"level": "INFO",
"handlers": [
"json-gunicorn-console"
]
}
}
}
示例日志
{"h": "127.0.0.1", "request": "GET /login?next=/ HTTP/1.1", "request_id": null}
{"h": "127.0.0.1", "request": "GET /static/css/style.css HTTP/1.1", "request_id": null}
{"h": "127.0.0.1", "request": "GET /some/random/path HTTP/1.1", "request_id": null}
{"h": "127.0.0.1", "request": "GET /some/random/path HTTP/1.1", "request_id": "123123123123123123"}
\"
的值中转义双引号 (
--access-logformat
),以使日志保持为有效的 JSON。因此,如果您在 Docker 容器中运行 Gunicorn,您的 Dockerfile
可能会以如下内容结尾:
CMD ["gunicorn", \
"-b", "0.0.0.0:5000", \
"--access-logfile", "-",\
"--access-logformat", "{\"remote_ip\":\"%(h)s\",\"request_id\":\"%({X-Request-Id}i)s\",\"response_code\":\"%(s)s\",\"request_method\":\"%(m)s\",\"request_path\":\"%(U)s\",\"request_querystring\":\"%(q)s\",\"request_timetaken\":\"%(D)s\",\"response_length\":\"%(B)s\"}", \
"app:create_app()"]
在此处
在gunicorn配置文件中使用您的示例时
access_log_format = '{"remote_ip":"%(h)s","request_id":"%({X-Request-Id}i)s","response_code":"%(s)s","request_method":"%(m)s","request_path":"%(U)s","request_querystring":"%(q)s","request_timetaken":"%(D)s","response_length":"%(B)s"}'
我得到了将其读取为 json 并将其与 fluidd json 数据合并的所需行为,但是,gunicorn 字段未填充
{"tags": [], "level": "INFO", "host": "ubuntu", "logger": "gunicorn.access", "remote_ip":"%(h)s","request_id":"%({X-Request-Id}i)s","response_code":"%(s)s","request_method":"%(m)s","request_path":"%(U)s","request_querystring":"%(q)s","request_timetaken":"%(D)s","response_length":"%(B)s"}
看起来原因是 Gunicorn 将
access_log_format
作为消息传递给记录器,并将所有参数 (
safe_atoms
) 作为附加参数,例如 /gunicorn/glogging.py
safe_atoms = self.atoms_wrapper_class(
self.atoms(resp, req, environ, request_time)
)
try:
# safe_atoms = {"s": "200", "m": "GET", ...}
self.access_log.info(self.cfg.access_log_format, safe_atoms)
但是,如果
FluentRecordFormatter
将字符串视为有效的 json,它将使用
json.loads
读取它,但会忽略传递的任何参数/flutter/handler.py
def _format_msg_json(self, record, msg):
try:
json_msg = json.loads(str(msg)) # <------- doesn't merge params
if isinstance(json_msg, dict):
return json_msg
else:
return self._format_msg_default(record, msg)
except ValueError:
return self._format_msg_default(record, msg)
将此与默认 Python 格式化程序
进行比较,后者调用 record.message = record.getMessage()
,后者又将参数合并到
init.py
def getMessage(self):
"""
Return the message for this LogRecord.
Return the message for this LogRecord after merging any user-supplied
arguments with the message.
"""
msg = str(self.msg)
if self.args:
msg = msg % self.args # <------ args get merged in
return msg
我已经记录了 Fluent-logger-python 项目的问题
在传递到FluentRecordFormatter
之前执行合并。
logger = logging.getLogger('fluent.test')
class ContextFilter(logging.Filter):
def filter(self, record):
record.msg = record.msg % record.args
return True
fluent_handler = handler.FluentHandler('app.follow', host='localhost', port=24224)
formatter = handler.FluentRecordFormatter()
fluent_handler.setFormatter(formatter)
merge_filter = ContextFilter()
fluent_handler.addFilter(merge_filter)
logger.addHandler(fluent_handler)
编辑:日志过滤器不起作用
ValueError: unsupported format character ';' (0x3b) at index 166
事实证明
FluentRecordFormatter
确实调用了基本
getMessage
实现,将参数合并到消息中 def format(self, record):
# Compute attributes handled by parent class.
super(FluentRecordFormatter, self).format(record) # <------ record.messge = record.msg % record.args
# Add ours
record.hostname = self.hostname
# Apply format
data = self._formatter(record)
self._structuring(data, record)
return data
问题在于
_format_msg_json(self, record, msg)
使用
record.msg
属性,即 unmerged数据,而
record.message
是合并后的数据。这会产生一个问题,即我的日志过滤器正在合并/格式化数据,但日志格式化程序也尝试这样做,并且偶尔会看到无效的语法。解决方法 2:不要使用 Json
<filter *.gunicorn.access>
@type parser
key_name message
reserve_time true
reserve_data true
remove_key_name_field true
hash_value_field access_log
<parse>
@type regexp
expression /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*) "(?<referer>[^\"]*)" "(?<agent>[^\"]*)"$/
time_format %d/%b/%Y:%H:%M:%S %z
</parse>
</filter>
您可以在此处了解这些选项的作用:https://docs. Fluentd.org/filter/parser