python-docx:删除参考书目(w:sdt)部分

问题描述 投票:0回答:1

我的最终目标是从 Microsoft Word 文档中查找并删除任何参考书目部分。

本期所述。 目前没有任何对

w:sdt
标签的 API 支持(我认为仅适用于参考书目)。不过,针对这个问题,给出了一个解决方法:

paragraph = ... # however you get the paragraph, maybe with `for paragraph in document.paragraphs`
p = paragraph._element
sdts = p.xpath('w:sdt')
for sdt in sdts:
    parent = sdt.getparent()
    parent.remove(sdt)

使用上述内容时,文档的 xml 会按照我的意愿更新和删除

w:sdt
标签。但是,当我保存新文档时,输出 .docx 文件仍然包含参考书目部分。

为什么尽管 python-docx 文档的 xml 不包含

w:sdt
元素,但在 Microsoft Word 中保存并打开文档后却没有反映出来。我已阅读here,文档关系可能与此有关,但考虑到我在打开新的Word文档时没有收到错误,我认为这不是问题。有什么想法吗?

这是 xml 片段预处理:

<w:p xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:oel="http://schemas.microsoft.com/office/2019/extlst" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml" xmlns:w16sdtdh="http://schemas.microsoft.com/office/word/2020/wordml/sdtdatahash" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" w14:paraId="2BE25C04" w14:textId="0C685DB1" w:rsidR="00572630" w:rsidRDefault="006A14EA">
  <w:r>
    <w:t xml:space="preserve">Hey there </w:t>
  </w:r>
  <w:sdt>
    <w:sdtPr>
      <w:id w:val="-39898552"/>
      <w:citation/>
    </w:sdtPr>
    <w:sdtContent>
      <w:r>
        <w:fldChar w:fldCharType="begin"/>
      </w:r>
      <w:r>
        <w:instrText xml:space="preserve"> CITATION Fra69 \l 2057 </w:instrText>
      </w:r>
      <w:r>
        <w:fldChar w:fldCharType="separate"/>
      </w:r>
      <w:r>
        <w:rPr>
          <w:noProof/>
        </w:rPr>
        <w:t>(Herbert, 1969)</w:t>
      </w:r>
      <w:r>
        <w:fldChar w:fldCharType="end"/>
      </w:r>
    </w:sdtContent>
  </w:sdt>
</w:p>

这是 xml 片段后处理,您可以在其中看到参考书目已被删除,但是一旦我保存并打开文档,就看不到这些更改:

<w:p xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:oel="http://schemas.microsoft.com/office/2019/extlst" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml" xmlns:w16sdtdh="http://schemas.microsoft.com/office/word/2020/wordml/sdtdatahash" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" w14:paraId="2BE25C04" w14:textId="0C685DB1" w:rsidR="00572630" w:rsidRDefault="006A14EA">
  <w:r>
    <w:t xml:space="preserve">Hey there </w:t>
  </w:r>
</w:p>

注意:我在这里问这个问题是因为 python-docx GitHub 最近没有太多活动。

python python-3.x ms-word python-docx
1个回答
0
投票

我意识到我的问题是我只删除了作为段落元素子元素的 w:sdt 元素,而不是主要参考书目 w:sdt 元素。需要删除两者才能完全删除所有参考书目及其参考文献。

供参考 - 以下是删除参考书目参考:

paragraph = ... # however you get the paragraph, maybe with `for paragraph in document.paragraphs`
p = paragraph._element
sdts = p.xpath('w:sdt')
for sdt in sdts:
    parent = sdt.getparent()
    parent.remove(sdt)

下面将删除主要参考书目元素:

body = doc._body._element
sdts = body.xpath("w:sdt")
if sdts:
    for sdt in sdts:
        parent = sdt.getparent()
        parent.remove(sdt)
© www.soinside.com 2019 - 2024. All rights reserved.