如何从没有尾部的lxml中的节点中删除标签？

Question

例：

html = <a><b>Text</b>Text2</a>

BeautifulSoup代码

[x.extract() for x in html.findAll(.//b)]

在退出我们有：

html = <a>Text2</a>

Lxml代码：

[bad.getparent().remove(bad) for bad in html.xpath(".//b")]

在退出我们有：

html = <a></a>

因为lxml认为“Text2”它是<b></b>的尾巴

如果我们只需要来自标签连接的文本行，我们可以使用：

for bad in raw.xpath(xpath_search):
    bad.text = ''

但是，如何在不更改文本的情况下执行此操作，但删除没有尾部的标

Answer 1

我做了以下操作来保护尾部文本到前一个兄弟或父母。

def remove_keeping_tail(self, element):
    """Safe the tail text and then delete the element"""
    self._preserve_tail_before_delete(element)
    element.getparent().remove(element)

def _preserve_tail_before_delete(self, node):
    if node.tail: # preserve the tail
        previous = node.getprevious()
        if previous is not None: # if there is a previous sibling it will get the tail
            if previous.tail is None:
                previous.tail = node.tail
            else:
                previous.tail = previous.tail + node.tail
        else: # The parent get the tail as text
            parent = node.getparent()
            if parent.text is None:
                parent.text = node.tail
            else:
                parent.text = parent.text + node.tail

HTH

Answer 2

虽然phlou接受的答案可行，但有更简单的方法可以在不删除标签的情况下删除标签。

如果要删除特定元素，那么您要查找的LXML方法是drop_tree。

来自文档：

丢弃元素及其所有子元素。与el.getparent（）。remove（el）不同，这不会删除尾部文本;使用drop_tree，尾部文本与前一个元素合并。

如果要删除特定标记的所有实例，可以将lxml.etree.strip_elements或lxml.html.etree.strip_elements与withtails=False一起使用。

使用树或子树中提供的标记名称删除所有元素。这将删除元素及其整个子树，包括其所有属性，文本内容和后代。除非您明确将with_tail关键字参数选项设置为False，否则它还将删除元素的尾部文本。

所以，对于原帖中的示例：

>>> from lxml.html import fragment_fromstring, tostring
>>>
>>> html = fragment_fromstring('<a><b>Text</b>Text2</a>')
>>> for bad in html.xpath('.//b'):
...    bad.drop_tag()
>>> tostring(html)
'<a>Text2</a>'

要么

>>> from lxml.html import fragment_fromstring, tostring, etree
>>>
>>> html = fragment_fromstring('<a><b>Text</b>Text2</a>')
>>> etree.strip_elements(html, 'b', with_tail=False)
>>> tostring(html)
'<a>Text2</a>'

如何从没有尾部的lxml中的节点中删除标签？

问题描述投票：5回答：2

2个回答

最新问题

如何从没有尾部的lxml中的节点中删除标签？

问题描述 投票：5回答：2

2个回答

最新问题

问题描述投票：5回答：2