使用正则表达式替换json中的额外引用

Question

我的json字符串中有一个意外的引用，使json.loads（jstr）失败。

json_str = '''{"id":"9","ctime":"2018-02-13","content":"abcd: "efg.","hots":"103b","date_sms":"2017-11-22"}'''

所以我想使用正则表达式匹配并删除“content”值内的引用。我在other solution尝试过一些东西：

import re
json_str = '''{"id":"9","ctime":"2018-02-13","content":"abcd: "efg.","hots":"103b","date_sms":"2017-11-22"}'''
pa = re.compile(r'(:\s+"[^"]*)"(?=[^"]*",)')
pa.findall(json_str)

[out]: []

有没有办法修复字符串？

Answer 1

正如@jonrsharpe所指出的那样，清理源代码会好得多。也就是说，如果您无法控制额外报价的来源，您可以使用(*SKIP)(*FAIL)使用较新的regex模块和neg。看起来像这样：

"[^"]+":\s*"[^"]+"[,}]\s*(*SKIP)(*FAIL)|(?<![,:])"(?![:,]\s*["}])

见a demo on regex101.com。

In Python:

import json, regex as re

json_str = '''{"id":"9","ctime":"2018-02-13","content":"abcd: "efg.","hots":"103b","date_sms":"2017-11-22"}'''

# clean the json
rx = re.compile('''"[^"]+":\s*"[^"]+"[,}]\s*(*SKIP)(*FAIL)|(?<![,:])"(?![:,]\s*["}])''')
json_str = rx.sub('', json_str)

# load it

json = json.loads(json_str)
print(json['id'])
# 9

Answer 2

我使用的可能解决方案：

whole = []
count = 0
with open(filename) as fin:
    for eachline in fin:
        pa = re.compile(r'"content":\s?"(.*?","\w)')
        for s in pa.findall(eachline):
            s = s[:-4]
            s_fix = s.replace("\"","")
            eachline = eachline.replace(s,s_fix)

        data = json.loads(eachline)
        whole.append(data)

使用正则表达式替换json中的额外引用

问题描述投票：1回答：2

2个回答

最新问题

使用正则表达式替换json中的额外引用

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2