我有一个类似 XML 语言的文本文件,如下所示:
<StoryText>
<DefaultStyle/>
<para ALIGN="3" LINESP="10"/>
<tab FONT="Times New Roman Regular" FONTSIZE="10" FEATURES="inherit" FCOLOR="Black" FSHADE="100" SCOLOR="Black" SSHADE="100" TXTSHX="5" TXTSHY="-5" TXTOUT="1" TXTULP="-0.1" TXTULW="-0.1" TXTSTP="-0.1" TXTSTW="-0.1" SCALEH="100" SCALEV="100" BASEO="0" KERN="0"/>
<ITEXT FONT="Times New Roman Regular" FONTSIZE="10" FEATURES="inherit" FCOLOR="Black" FSHADE="100" SCOLOR="Black" SSHADE="100" TXTSHX="5" TXTSHY="-5" TXTOUT="1" TXTULP="-0.1" TXTULW="-0.1" TXTSTP="-0.1" TXTSTW="-0.1" SCALEH="100" SCALEV="100" BASEO="0" KERN="0" CH="TEXT"/>
<ITEXT FONT="Times New Roman Bold" FONTSIZE="10" FEATURES="inherit" FCOLOR="Black" FSHADE="100" SCOLOR="Black" SSHADE="100" TXTSHX="5" TXTSHY="-5" TXTOUT="1" TXTULP="-0.1" TXTULW="-0.1" TXTSTP="-0.1" TXTSTW="-0.1" SCALEH="100" SCALEV="100" BASEO="0" KERN="0" CH="TEXT"/>
<ITEXT FONT="Times New Roman Regular" FONTSIZE="10" FEATURES="inherit" FCOLOR="Black" FSHADE="100" SCOLOR="Black" SSHADE="100" TXTSHX="5" TXTSHY="-5" TXTOUT="1" TXTULP="-0.1" TXTULW="-0.1" TXTSTP="-0.1" TXTSTW="-0.1" SCALEH="100" SCALEV="100" BASEO="0" KERN="0" CH="TEXT"/>
</StoryText>
我的目标是在 python 中解析这个文件,以便能够用另一个文本替换 CH= 属性内容。
示例:
> <ITEXT FONT="Times New Roman Bold" FONTSIZE="10" FEATURES="inherit"
> FCOLOR="Black" FSHADE="100" SCOLOR="Black" SSHADE="100" TXTSHX="5"
> TXTSHY="-5" TXTOUT="1" TXTULP="-0.1" TXTULW="-0.1" TXTSTP="-0.1"
> TXTSTW="-0.1" SCALEH="100" SCALEV="100" BASEO="0" KERN="0"
> CH="**TEXT**"/>
转变为
> <ITEXT FONT="Times New Roman Bold" FONTSIZE="10" FEATURES="inherit"
> FCOLOR="Black" FSHADE="100" SCOLOR="Black" SSHADE="100" TXTSHX="5"
> TXTSHY="-5" TXTOUT="1" TXTULP="-0.1" TXTULW="-0.1" TXTSTP="-0.1"
> TXTSTW="-0.1" SCALEH="100" SCALEV="100" BASEO="0" KERN="0"
> CH="**REPLACEMENT TEXT**"/>
我尝试像往常一样使用带有 parse 和 getroot 方法的 xml.etree.ElementTree 库,但在这里我收到错误消息:
xml.etree.ElementTree.ParseError: no element found
出现此消息显然是因为该文件不是真正的 XML,但看起来很相似。
您知道我如何实现这一目标吗? 注意:我不允许通过更改条目文件的结构来重新格式化条目文件,因为这是一个 scribus .sla 文件
尝试:
import xml.etree.ElementTree as ET
xml_data = """\
<StoryText>
<DefaultStyle/>
<para ALIGN="3" LINESP="10"/>
<tab FONT="Times New Roman Regular" FONTSIZE="10" FEATURES="inherit" FCOLOR="Black" FSHADE="100" SCOLOR="Black" SSHADE="100" TXTSHX="5" TXTSHY="-5" TXTOUT="1" TXTULP="-0.1" TXTULW="-0.1" TXTSTP="-0.1" TXTSTW="-0.1" SCALEH="100" SCALEV="100" BASEO="0" KERN="0"/>
<ITEXT FONT="Times New Roman Regular" FONTSIZE="10" FEATURES="inherit" FCOLOR="Black" FSHADE="100" SCOLOR="Black" SSHADE="100" TXTSHX="5" TXTSHY="-5" TXTOUT="1" TXTULP="-0.1" TXTULW="-0.1" TXTSTP="-0.1" TXTSTW="-0.1" SCALEH="100" SCALEV="100" BASEO="0" KERN="0" CH="TEXT"/>
<ITEXT FONT="Times New Roman Bold" FONTSIZE="10" FEATURES="inherit" FCOLOR="Black" FSHADE="100" SCOLOR="Black" SSHADE="100" TXTSHX="5" TXTSHY="-5" TXTOUT="1" TXTULP="-0.1" TXTULW="-0.1" TXTSTP="-0.1" TXTSTW="-0.1" SCALEH="100" SCALEV="100" BASEO="0" KERN="0" CH="TEXT"/>
<ITEXT FONT="Times New Roman Regular" FONTSIZE="10" FEATURES="inherit" FCOLOR="Black" FSHADE="100" SCOLOR="Black" SSHADE="100" TXTSHX="5" TXTSHY="-5" TXTOUT="1" TXTULP="-0.1" TXTULW="-0.1" TXTSTP="-0.1" TXTSTW="-0.1" SCALEH="100" SCALEV="100" BASEO="0" KERN="0" CH="TEXT"/>
</StoryText>"""
root = ET.fromstring(xml_data)
for elem in root.iter("ITEXT"):
if "TEXT" == elem.get("CH"):
elem.attrib["CH"] = "REPLACEMENT TEXT"
print(ET.tostring(root, encoding="utf-8").decode("utf-8"))
打印:
<StoryText>
<DefaultStyle />
<para ALIGN="3" LINESP="10" />
<tab FONT="Times New Roman Regular" FONTSIZE="10" FEATURES="inherit" FCOLOR="Black" FSHADE="100" SCOLOR="Black" SSHADE="100" TXTSHX="5" TXTSHY="-5" TXTOUT="1" TXTULP="-0.1" TXTULW="-0.1" TXTSTP="-0.1" TXTSTW="-0.1" SCALEH="100" SCALEV="100" BASEO="0" KERN="0" />
<ITEXT FONT="Times New Roman Regular" FONTSIZE="10" FEATURES="inherit" FCOLOR="Black" FSHADE="100" SCOLOR="Black" SSHADE="100" TXTSHX="5" TXTSHY="-5" TXTOUT="1" TXTULP="-0.1" TXTULW="-0.1" TXTSTP="-0.1" TXTSTW="-0.1" SCALEH="100" SCALEV="100" BASEO="0" KERN="0" CH="REPLACEMENT TEXT" />
<ITEXT FONT="Times New Roman Bold" FONTSIZE="10" FEATURES="inherit" FCOLOR="Black" FSHADE="100" SCOLOR="Black" SSHADE="100" TXTSHX="5" TXTSHY="-5" TXTOUT="1" TXTULP="-0.1" TXTULW="-0.1" TXTSTP="-0.1" TXTSTW="-0.1" SCALEH="100" SCALEV="100" BASEO="0" KERN="0" CH="REPLACEMENT TEXT" />
<ITEXT FONT="Times New Roman Regular" FONTSIZE="10" FEATURES="inherit" FCOLOR="Black" FSHADE="100" SCOLOR="Black" SSHADE="100" TXTSHX="5" TXTSHY="-5" TXTOUT="1" TXTULP="-0.1" TXTULW="-0.1" TXTSTP="-0.1" TXTSTW="-0.1" SCALEH="100" SCALEV="100" BASEO="0" KERN="0" CH="REPLACEMENT TEXT" />
</StoryText>