我有有一些联编码,HTML标签一个外部JSON文件,\ n和\ t字符我想删除所有这些事情,并希望只保留字符串没有打破JSON格式到目前为止,我已经尝试这样做,看到许多解决方案,但毫无效果。真的很感谢您的时间。这里是我的代码
我使用python 3.x.x
import json, re
from html.parser import HTMLParser
def remove_html_tags(data):
p = re.compile(r'<.*?>')
return p.sub('', data)
with open('project-closedtasks-avgdaysopen.json') as f:
data = json.load(f)
data = json.dumps(data, indent=4)
print(data)
请注意,这是我在(从同一个文件夹导入)我得到的文件,我想同样的输出,但不包括HTML标记,没有内嵌样式,没有\ n或其他东西只有字符串。
[
{
"idrfi" : 36809,
"fkproject" : 33235,
"subject" : "M2 - Flashing Clarifications",
"description" : "<ol style=\"margin-left:0.375in\">\n\t<li><span style=\"font-family:calibri; font-size:11pt\">Refer to detail 5/A650 attached. Can the pre-finished metal panel be swapped for pre-finished metal flashing? This will allow the full assembly to be installed by the mechanical HVAC trade vs requiring the cladding trade to return for penthouse work. </span></li>\n</ol>\n",
"response" : null
},
{
"idrfi" : 36808,
"fkproject" : 33139,
"subject" : "M1 - Flashing Clarifications",
"description" : "<ol style=\"margin-left:0.2in\">\n\t<li><span style=\"font-family:calibri; font-size:11pt\">Refer to detail 6/A612 attached. Clarify location of flashing on detail.</span></li>\n\t<li><span style=\"font-family:calibri; font-size:11pt\">Refer to details 2,4/A614 attached. Clarify location of flashing on detail. </span></li>\n\t<li><span style=\"font-family:calibri; font-size:11pt\">Refer to detail 3/A616 attached. Clarify location of flashing on detail.</span></li>\n\t<li><span style=\"font-family:calibri; font-size:11pt\">Refer to detail 5/A650 attached. Can the pre-finished metal panel be swapped for pre-finished metal flashing? This will allow the full assembly to be installed by the mechanical HVAC trade vs requiring the cladding trade to return for penthouse work. </span></li>\n</ol>\n",
"response" : null
}
]
我发现功能,但我不知道如何实现它
def remove_html_tags(data):
p = re.compile(r'<.*?>')
return p.sub('', data)
这个实施后编辑&NBSP,\ n \ T和其他东西都没有删除,我想只有字符串没有标签没有什么造型
import json, re
from html.parser import HTMLParser
def remove_html_tags(data):
p = re.compile(r'<.*?>')
return p.sub('', data)
with open('project-closedtasks-avgdaysopen.json') as f:
data = json.load(f)
data = json.dumps(data, indent=4)
removed_tags = remove_html_tags(data)
print(removed_tags)
只要叫你写了一个函数
import json, re
from html.parser import HTMLParser
def remove_html_tags(data):
p = re.compile(r'<.*?>')
return p.sub('', data)
with open('project-closedtasks-avgdaysopen.json') as f:
data = json.load(f)
data = json.dumps(data, indent=4)
removed_tags = remove_html_tags(data)
print(removed_tags)
我查了一下它,它的正常工作。