我有一个 marc xml 文件,集合中有两条记录。我想从文件中删除 955 个数据字段。
当我尝试迭代由
findall
生成的列表时,我得到 ValueError
、list.remove(x): x not in list
。
import xml.etree.ElementTree as ET
tree = ET.parse('toggle.xml')
root = tree.getroot()
for a955 in root.findall('record/datafield[@tag="955"]'):
root.remove(a955)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[1], line 6
3 root = tree.getroot()
5 for a955 in root.findall('record/datafield[@tag="955"]'):
----> 6 root.remove(a955)
ValueError: list.remove(x): x not in list
这是我正在尝试修改的 xml(为了简洁起见,我删除了一些数据字段):
<collection>
<record>
<leader>00859cam a2200277Ia 4500</leader>
<controlfield tag="005">20170510144913.0</controlfield>
<controlfield tag="008">880930s1983 enka 00010 eng d</controlfield>
<datafield tag="035" ind1=" " ind2=" ">
<subfield code="a">ocm13279646 880930</subfield>
</datafield>
<datafield tag="035" ind1=" " ind2=" ">
<subfield code="9">0674-46060</subfield>
</datafield>
<datafield tag="035" ind1=" " ind2=" ">
<subfield code="a">(StEdNL)1580610-nlsdb-Voyager</subfield>
</datafield>
<datafield tag="700" ind1="1" ind2="0">
<subfield code="a">Evans, Martin.</subfield>
</datafield>
<datafield tag="710" ind1="2" ind2="0">
<subfield code="a">Health Education Council.</subfield>
<subfield code="w">cn</subfield>
</datafield>
<datafield tag="710" ind1="2" ind2="0">
<subfield code="a">Teachers' Advisory Council on Alcohol and Drug Education.</subfield>
</datafield>
<datafield tag="955" ind1=" " ind2=" ">
<subfield code="a">QP4.88.1745</subfield>
<subfield code="b">QP4DOT88DOT</subfield>
</datafield>
<datafield tag="956" ind1=" " ind2=" ">
<subfield code="a">NLS</subfield>
</datafield>
</record>
<record>
<leader>01030cas a2200349 i 4500</leader>
<controlfield tag="005">20190312175642.0</controlfield>
<controlfield tag="008">130830c20139999stkwr ne 0 a0eng d</controlfield>
<datafield tag="015" ind1=" " ind2=" ">
<subfield code="a">GBB386135</subfield>
<subfield code="2">bnb</subfield>
</datafield>
<datafield tag="022" ind1="1" ind2=" ">
<subfield code="a">2053-6496</subfield>
</datafield>
<datafield tag="035" ind1=" " ind2=" ">
<subfield code="a">(Uk)016484976</subfield>
</datafield>
<datafield tag="035" ind1=" " ind2=" ">
<subfield code="a">2992934</subfield>
</datafield>
<datafield tag="035" ind1=" " ind2=" ">
<subfield code="a">(StEdNL)5112576-nlsdb-Voyager</subfield>
</datafield>
<datafield tag="651" ind1=" " ind2="0">
<subfield code="a">Troon (Scotland)</subfield>
<subfield code="v">Newspapers.</subfield>
</datafield>
<datafield tag="651" ind1=" " ind2="0">
<subfield code="a">South Ayrshire (Scotland)</subfield>
<subfield code="v">Newspapers.</subfield>
</datafield>
<datafield tag="752" ind1=" " ind2=" ">
<subfield code="a">Scotland</subfield>
<subfield code="b">Strathclyde</subfield>
<subfield code="d">Troon.</subfield>
<subfield code="2">blnpn</subfield>
</datafield>
<datafield tag="919" ind1=" " ind2=" ">
<subfield code="a">NBS</subfield>
</datafield>
<datafield tag="955" ind1=" " ind2=" ">
<subfield code="y">2020</subfield>
<subfield code="b">V000258858</subfield>
</datafield>
</record>
</collection>
ElementTree docs
中的示例:
for country in root.findall('country'):
# using root.findall() to avoid removal during traversal
rank = int(country.find('rank').text)
if rank > 50:
root.remove(country)
我确信我做错了一些非常基本的事情,但我就是不知道它是什么。
根据文档:
remove(subelement)
从元素中删除子元素。
在这种情况下,子元素是 child。
remove
不会遍历整个树来查找您要求其删除的元素。
您选择的元素不是
root
(又名 collection
)的子元素,而是 record
的子元素。