剥离的HTML,内嵌样式和新的生产线,标签字符从在python JSON

问题描述 投票:0回答:1

我有有一些联编码,HTML标签一个外部JSON文件,\ n和\ t字符我想删除所有这些事情,并希望只保留字符串没有打破JSON格式到目前为止,我已经尝试这样做,看到许多解决方案,但毫无效果。真的很感谢您的时间。这里是我的代码

我使用python 3.x.x

import json, re
from html.parser import HTMLParser

def remove_html_tags(data):
    p = re.compile(r'<.*?>')
    return p.sub('', data)

with open('project-closedtasks-avgdaysopen.json') as f:
    data = json.load(f)
    data = json.dumps(data, indent=4)
print(data)

请注意,这是我在(从同一个文件夹导入)我得到的文件,我想同样的输出,但不包括HTML标记,没有内嵌样式,没有\ n或其他东西只有字符串。

[
    {
        "idrfi" : 36809,
        "fkproject" : 33235,
        "subject" : "M2 - Flashing Clarifications",
        "description" : "<ol style=\"margin-left:0.375in\">\n\t<li><span style=\"font-family:calibri; font-size:11pt\">Refer to detail 5/A650 attached. Can the pre-finished metal panel be swapped for pre-finished metal flashing? This will allow the full assembly to be installed by the mechanical HVAC trade vs requiring the cladding trade to return for penthouse work. </span></li>\n</ol>\n",
        "response" : null
    },
    {
        "idrfi" : 36808,
        "fkproject" : 33139,
        "subject" : "M1 - Flashing Clarifications",
        "description" : "<ol style=\"margin-left:0.2in\">\n\t<li><span style=\"font-family:calibri; font-size:11pt\">Refer to detail 6/A612 attached. Clarify location of flashing on detail.</span></li>\n\t<li><span style=\"font-family:calibri; font-size:11pt\">Refer to details 2,4/A614 attached. Clarify location of flashing on detail. </span></li>\n\t<li><span style=\"font-family:calibri; font-size:11pt\">Refer to detail 3/A616 attached. Clarify location of flashing on detail.</span></li>\n\t<li><span style=\"font-family:calibri; font-size:11pt\">Refer to detail 5/A650 attached. Can the pre-finished metal panel be swapped for pre-finished metal flashing? This will allow the full assembly to be installed by the mechanical HVAC trade vs requiring the cladding trade to return for penthouse work. </span></li>\n</ol>\n",
        "response" : null
    }
]

我发现功能,但我不知道如何实现它

def remove_html_tags(data):
    p = re.compile(r'<.*?>')
    return p.sub('', data)

这个实施后编辑&NBSP,\ n \ T和其他东西都没有删除,我想只有字符串没有标签没有什么造型

import json, re
from html.parser import HTMLParser

def remove_html_tags(data):
    p = re.compile(r'<.*?>')
    return p.sub('', data)

with open('project-closedtasks-avgdaysopen.json') as f:
    data = json.load(f)
    data = json.dumps(data, indent=4)
    removed_tags = remove_html_tags(data)
print(removed_tags)
python-3.x html5
1个回答
0
投票

只要叫你写了一个函数

import json, re
from html.parser import HTMLParser

def remove_html_tags(data):
    p = re.compile(r'<.*?>')
    return p.sub('', data)

with open('project-closedtasks-avgdaysopen.json') as f:
    data = json.load(f)
    data = json.dumps(data, indent=4)
    removed_tags = remove_html_tags(data)
print(removed_tags)

我查了一下它,它的正常工作。

© www.soinside.com 2019 - 2024. All rights reserved.