Converting syslog to Json - expecting as object

问题描述 投票:0回答:1

我有一个 python lambda 代码来将 syslog 记录转换为 Json。部署时出现意外错误。

from __future__ import print_function

import base64
import json
import gzip
import re

print('Loading function')


def lambda_handler(event, context):
    output = []
    succeeded_record_cnt = 0
    failed_record_cnt = 0

    for record in event['records']:
        print(record['recordId'])
        payload = base64.b64decode(record['data'])

        regex_string = (r"^((?:\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?"
                        r"|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b\s+(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])\s+"
                        r"(?:(?:2[0123]|[01]?[0-9]):(?:[0-5][0-9]):(?:(?:[0-5]?[0-9]|60)(?:[:\.,][0-9]+)?)))) (?:<(?:[0-9]+).(?:[0-9]+)> )"
                        r"?((?:[a-zA-Z0-9._-]+)) ([\w\._/%-]+)(?:\[((?:[1-9][0-9]*))\])?: (.*)")
        p = re.compile(regex_string)
        m = p.match(payload)
        if m:
            succeeded_record_cnt += 1
            data_field = {
                'timestamp': m.group(1),
                'hostname': m.group(2),
                'program': m.group(3),
                'processid': m.group(4),
                'message': m.group(5)
            }
            output_record = {
                'recordId': record['recordId'],
                'result': 'Ok',
                'data': base64.b64encode(json.dumps(data_field))
            }
        else:
            print('Parsing failed')
            failed_record_cnt += 1
            output_record = {
                'recordId': record['recordId'],
                'result': 'ProcessingFailed',
                'data': record['data']
            }

        output.append(output_record)

    print('Processing completed.  Successful records {}, Failed records {}.'.format(succeeded_record_cnt, failed_record_cnt))
    return {'records': output}

当我部署它时,我得到了类似的错误,它期望数据作为一个对象,我解码了记录并再次部署它但得到了类似的错误:

[ERROR] TypeError: cannot use a string pattern on a bytes-like object
Traceback (most recent call last):
  File "/var/task/ec2_logs_parquet.py", line 24, in lambda_handler
    m = p.match(payload)

我尝试使用下面的补丁对其进行解码并创建单独的变量来传递数据来修复它,但它不起作用。

    payload = base64.b64decode(record['data'])
    payload_str = payload.decode('utf-8')

    p = re.compile(regex_string)
    m = p.match(payload_str)

仍然出错。我在这里错过了什么吗?

python json regex lambda syslog
1个回答
0
投票

这是一种更简单的解析系统日志记录的方法。

for row in open('/var/log/syslog'):
    time = row[:15]
    host,prog,message = row[16:].split(maxsplit=2)
    if '[' in prog:
        prog,process = prog.split('[')
        process = process[:-2]
    else:
        proc = prog[:-1]
        process = ''

    data_field = {
        'timestamp': time,
        'hostname': host,
        'program': prog,
        'processid': process,
        'message': message
    }
    print(data_field)
© www.soinside.com 2019 - 2024. All rights reserved.