如何将 spaCy 模型结果附加到 XML - 需要 XSLT 或 Python 脚本而不损坏标签

问题描述 投票:0回答:0

我将此 JSON 数据附加到下面的 xml 中:

{
        "para-num": 31,
        "ele-id": "71FCC3AE",
        "conv-tag": "para",
        "text": "All chlorophylls have weak absorption at green wavelengths, making plants appear green. Chemical modification of chlorophylls manipulates the p-p electron conjugation resulting in changed spectral properties, therefore, photosynthetic organisms with different chlorophyll contents may be much better matched to the solar spectrum than those containing only one kind of chlorophylls (Fig 1). Plants growing in the shaded environments have higher content of Chl b to capture more light (Hikosaka and Terashima 1995). Even within the same plant, the position of leaves shows a changed ratio of chlorophylls to match the changed light environments. On the top of the canopy, less chlorophyll content per area is observed; in contrast, a higher ratio of Chl b to Chl a is present in the leaves at the lower level or shaded parts of the plant (Gu et al. 2017; Hikosaka and Terashima 1995; Lichtenthaler et al. 2017).  Additionally, by introducing vertically oriented leaves to change the canopy structure demonstrated enhanced yield of rice (Oryza sativa) (Long et al. 2006).",
        "token": null,
        "nlp-data": [
            {
                "text": "Fig 1",
                "start-index": 383,
                "end-index": 388,
                "label": "FIGURE_CITE"
            },
            {
                "text": "Hikosaka and Terashima 1995",
                "start-index": 485,
                "end-index": 512,
                "label": "REF_CITE"
            },
            {
                "text": "Gu et al. 2017",
                "start-index": 838,
                "end-index": 852,
                "label": "REF_CITE"
            },
            {
                "text": "Hikosaka and Terashima 1995",
                "start-index": 854,
                "end-index": 881,
                "label": "REF_CITE"
            },
            {
                "text": "Lichtenthaler et al. 2017",
                "start-index": 883,
                "end-index": 908,
                "label": "REF_CITE"
            },
            {
                "text": "Long et al. 2006",
                "start-index": 1051,
                "end-index": 1067,
                "label": "REF_CITE"
            }
        ]
    }

XML数据:

<PARA ele-id="71FCC3AE" alignment="JUSTIFY (3)" space_after="6.0">All chlorophylls have weak absorption at green wavelengths, making plants appear green. Chemical modification of chlorophylls manipulates the p-p electron conjugation resulting in changed spectral properties, therefore, photosynthetic organisms with different chlorophyll contents may be much better matched to the solar spectrum than those containing only one kind of chlorophylls (Fig 1). Plants growing in the shaded environments have higher content of Chl <emphasis role="cs_italic"><emphasis role="italic">b</emphasis></emphasis> to capture more light (Hikosaka and Terashima 1995). Even within the same plant, the position of leaves shows a changed ratio of chlorophylls to match the changed light environments. On the top of the canopy, less chlorophyll content per area is observed; in contrast, a higher ratio of Chl <emphasis role="cs_italic"><emphasis role="italic">b</emphasis></emphasis> to Chl <emphasis role="cs_italic"><emphasis role="italic">a</emphasis></emphasis> is present in the leaves at the lower level or shaded parts of the plant (Gu <emphasis role="cs_italic"><emphasis role="italic">et al.</emphasis></emphasis> 2017; Hikosaka and Terashima 1995; Lichtenthaler <emphasis role="cs_italic"><emphasis role="italic">et al.</emphasis></emphasis> 2017).  Additionally, by introducing vertically oriented leaves to change the canopy structure demonstrated enhanced yield of rice (<emphasis role="cs_italic"><emphasis role="italic">Oryza sativa</emphasis></emphasis>) (Long <emphasis role="cs_italic"><emphasis role="italic">et al.</emphasis></emphasis> 2006).</PARA>

我希望标签在不使用 XSLT 或 python 篡改现有 XML 标记的情况下附加和包装字符串的起始位置和结束位置。我试过 BeautifulSoup 的逻辑算术运算,但很难包装字符串。

<PARA ele-id="71FCC3AE" alignment="JUSTIFY (3)" space_after="6.0">All chlorophylls have weak absorption at green wavelengths, making plants appear green. Chemical modification of chlorophylls manipulates the p-p electron conjugation resulting in changed spectral properties, therefore, photosynthetic organisms with different chlorophyll contents may be much better matched to the solar spectrum than those containing only one kind of chlorophylls (<FIGURE_CITE>Fig 1</FIGURE_CITE>). Plants growing in the shaded environments have higher content of Chl <emphasis role="cs_italic"><emphasis role="italic">b</emphasis></emphasis> to capture more light (<REF_CITE>Hikosaka and Terashima 1995</REF_CITE>). Even within the same plant, the position of leaves shows a changed ratio of chlorophylls to match the changed light environments. On the top of the canopy, less chlorophyll content per area is observed; in contrast, a higher ratio of Chl <emphasis role="cs_italic"><emphasis role="italic">b</emphasis></emphasis> to Chl <emphasis role="cs_italic"><emphasis role="italic">a</emphasis></emphasis> is present in the leaves at the lower level or shaded parts of the plant (<REF_CITE>Gu <emphasis role="cs_italic"><emphasis role="italic">et al.</emphasis></emphasis> 2017</REF_CITE>; <REF_CITE>Hikosaka and Terashima 1995</REF_CITE>; <REF_CITE>Lichtenthaler <emphasis role="cs_italic"><emphasis role="italic">et al.</emphasis></emphasis> 2017</REF_CITE>).  Additionally, by introducing vertically oriented leaves to change the canopy structure demonstrated enhanced yield of rice (<emphasis role="cs_italic"><emphasis role="italic">Oryza sativa</emphasis></emphasis>) (<REF_CITE>Long <emphasis role="cs_italic"><emphasis role="italic">et al.</emphasis></emphasis> 2006</REF_CITE>).</PARA>
python-3.x xslt beautifulsoup lxml spacy-3
© www.soinside.com 2019 - 2024. All rights reserved.