lxml 相关问题

lxml是一个功能齐全的高性能Python库，用于处理XML和HTML。

我正在提取亚马逊畅销书数据书籍名称、作者姓名和书籍价格。对于此任务，我使用 beautifulSoup 和 requests 库。网址是 - https://www.amazon.in/gp/

python web-scraping beautifulsoup lxml price

回答 1 投票 0

给定一个具有以下结构的 XML 文件： 1 给定一个具有以下结构的 XML 文件： <log> <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <EventID>1</EventID> # if this is 1 </System> <EventData> <Data Name="CommandLine">C:\Windows\system32\wbem\unsecapp.exe -Embedding</Data> # then I want this value </EventData> </Event> <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <EventID>2</EventID> </System> <EventData> <Data Name="CommandLine">C:\Windows\system32\wbem\unsecapp.exe -Embedding</Data> </EventData> </Event> </log> 我想检查所有 <Event> 如果 <EventID> = 1，然后使用 <Data Name='CommandLine'> 的值使用此代码 from lxml import etree as ET with open(log_file_path, 'r', encoding='utf-8') as file: log_content = file.read() root = ET.fromstring(log_content) ns = {'ns' : 'http://schemas.microsoft.com/win/2004/08/events/event'} root.xpath("//ns:Event[System/EventID='1']/EventData/Data[@Name='CommandLine']", namespaces=ns) 什么也没找到。我在在线 xpath 工具中使用相同的 xml 尝试了相同的 xpath 查询//Event[System/EventID='1']/EventData/Data[@Name='CommandLine']，它按预期工作。我无法弄清楚问题所在，有什么想法吗？元素上的默认命名空间也适用于其后代元素，因此请更改 //ns:Event[System/EventID='1']/EventData/Data[@Name='CommandLine'] 到 //ns:Event[ns:System/ns:EventID='1']/ns:EventData/ns:Data[@Name='CommandLine']

python xml xpath lxml

回答 1 投票 0

BeautifulSoup(html, "html.parser") 和 BeautifulSoup(html, "xml") 有不同的查找行为，如何使其相同？

使用 soup_html = BeautifulSoup(html, "html.parser") 解析 HTML 使用默认解析器。使用 soup_xml = BeautifulSoup(html, "xml") 解析 HTML 使用 lxml lib 中的解析器。如果H...

web-scraping beautifulsoup lxml

回答 1 投票 0

在Python中使用分离的模式将XML转换为JSON

我希望将传入的 XML 数据转换为 JSON，以便在 Python 中更有效地处理数据。 XML 是非标准格式，其中架构是在相关 va 之上定义的...

python json xml lxml data-conversion

回答 1 投票 0

Python lxml 通过 id-tag 查找元素

我正在开发一个Python程序来保存储藏室的库存。在 XML 文档中，将保留碳粉量，我希望我的 python 程序能够添加、删除和显示

python xml xml-parsing lxml

回答 2 投票 0

Python etree 解析 html 文本失败（返回 NoneType）

为什么输出为“None”？它应该像“”或其他东西。注意：仅在我的 Mac 上出现问题。我尝试过使用 p...

html python-3.x lxml elementtree

回答 1 投票 0

Python lxml - 使用 xml:lang 属性检索元素都铎王朝 <question vote="2"> 我有一些 xml，其中包含多个同名元素，但每个元素都采用不同的语言，例如： <pre><code><Title xml:lang="FR" type="main">Les Tudors</Title> <Title xml:lang="DE" type="main">Die Tudors</Title> <Title xml:lang="IT" type="main">The Tudors</Title> </code></pre> 通常，我会使用其属性检索元素，如下所示： <pre><code>titlex = info.find('.//xmlns:Title[@someattribute=attributevalue]', namespaces=nsmap) </code></pre> 如果我尝试使用 [@xml:lang="FR"] （例如）执行此操作，我会收到回溯错误： <pre><code> File "D:/Python code/RBM CRID, Title, Genre/CRID, Title, Genre, Age rating, Episode Number, Descriptions V1.py", line 29, in <module> titlex = info.find('.//xmlns:Title[@xml:lang=PL]', namespaces=nsmap) File "lxml.etree.pyx", line 1457, in lxml.etree._Element.find (src\lxml\lxml.etree.c:51435) File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 282, in find it = iterfind(elem, path, namespaces) File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 272, in iterfind selector = _build_path_iterator(path, namespaces) File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 256, in _build_path_iterator selector.append(ops[token[0]](_next, token)) File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 134, in prepare_predicate token = next() File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 80, in xpath_tokenizer raise SyntaxError("prefix %r not found in prefix map" % prefix) SyntaxError: prefix 'xml' not found in prefix map </code></pre> 我对此并不感到惊讶，但我希望获得有关如何解决该问题的建议。 谢谢！ 根据要求，一组精简但完整的代码（如果我删除[方括号中的位]，它会按预期工作）： <pre><code>import lxml import codecs file_name = (input('Enter the file name, excluding .xml extension: ') + '.xml')# User inputs file name print('Parsing ' + file_name) #----- Sets up import and namespace from lxml import etree parser = lxml.etree.XMLParser() tree = lxml.etree.parse(file_name, parser) # Name of file to test goes here root = tree.getroot() nsmap = {'xmlns': 'urn:tva:metadata:2012', 'mpeg7': 'urn:tva:mpeg7:2008'} #----- This code writes the output to a file with codecs.open(file_name+'.log', mode='w', encoding='utf-8') as f: # Name the output file f.write(u'CRID|Title|Genre|Rating|Short Synopsis|Medium Synopsis|Long Synopsis\n') for info in root.xpath('//xmlns:ProgramInformation', namespaces=nsmap): titlex = info.find('.//xmlns:Title[xml:lang="PL"]', namespaces=nsmap) # Retreve the title title = titlex.text if titlex != None else 'Missing' # If there isn't a title, print an alternative word f.write(u'{}\n'.format(title)) # Write all the retrieved values to the same line with bar seperators and a new line </code></pre> </question> <answer tick="false" vote="3"> 使用<pre><code>find()</code></pre> <pre><code>xml</code></pre>中的<pre><code>xml:lang</code></pre>前缀不需要在XML文档中声明，但是如果你想在XPath查找中使用<pre><code>xml:lang</code></pre>（使用<pre><code>find()</code></pre>或<pre><code>findall()</code></pre>），你必须定义一个前缀Python 代码中的映射。 <pre><code>xml</code></pre>前缀是保留的（与任意的“正常”命名空间前缀相反）并定义为绑定到<pre><code>http://www.w3.org/XML/1998/namespace</code></pre>。请参阅 <a href="http://www.w3.org/TR/REC-xml-names/#ns-decl" rel="nofollow noreferrer">XML 1.0 中的命名空间</a> W3C 建议。 示例： <pre><code>from lxml import etree # Required mapping when using "find" nsmap = {"xml": "http://www.w3.org/XML/1998/namespace"} XML = """ <root> <Title xml:lang="FR" type="main">Les Tudors</Title> <Title xml:lang="DE" type="main">Die Tudors</Title> <Title xml:lang="IT" type="main">The Tudors</Title> </root>""" doc = etree.fromstring(XML) title_FR = doc.find('Title[@xml:lang="FR"]', namespaces=nsmap) print(title_FR.text) </code></pre> 输出： <pre><code>Les Tudors </code></pre> 如果 <pre><code>xml</code></pre> 前缀没有映射，您会收到“在前缀映射中找不到前缀 'xml'”错误。如果映射到 <code>xml</code><pre> 前缀的 URI 不是 </pre><code>http://www.w3.org/XML/1998/namespace</code><pre>，则上面代码片段中的 </pre><code>find</code><pre> 方法不会返回任何内容。</pre> 使用<code>xpath()</code><pre></pre> 使用<code>xpath()</code><pre>方法，不需要前缀：URI映射：</pre> <code>title_FR = doc.xpath('Title[@xml:lang="FR"]')[0] print(title_FR.text) </code><pre> </pre>输出： <code>Les Tudors </code><pre> </pre> </answer> <answer tick="false" vote="0">如果您可以控制 <code>xml</code><pre> 文件，则应将 </pre><code>xml:lang</code><pre> 属性更改为 </pre><code>lang</code><pre> 。</pre> 或者如果您没有该控制权，我建议在 nsmap 中添加 <code>xml</code><pre>，例如 -</pre> <code>nsmap = {'xmlns': 'urn:tva:metadata:2012', 'mpeg7': 'urn:tva:mpeg7:2008', 'xml': '<namespace>'} </code><pre> </pre> </answer></body>

我有一些 xml，其中有多个同名元素，但每个元素都采用不同的语言，例如：都铎王朝 <question vote="2"> 我有一些 xml，其中包含多个同名元素，但每个元素都采用不同的语言，例如： <pre><code><Title xml:lang="FR" type="main">Les Tudors</Title> <Title xml:lang="DE" type="main">Die Tudors</Title> <Title xml:lang="IT" type="main">The Tudors</Title> </code></pre> 通常，我会使用其属性检索元素，如下所示： <pre><code>titlex = info.find('.//xmlns:Title[@someattribute=attributevalue]', namespaces=nsmap) </code></pre> 如果我尝试使用 [@xml:lang="FR"] （例如）执行此操作，我会收到回溯错误： <pre><code> File "D:/Python code/RBM CRID, Title, Genre/CRID, Title, Genre, Age rating, Episode Number, Descriptions V1.py", line 29, in <module> titlex = info.find('.//xmlns:Title[@xml:lang=PL]', namespaces=nsmap) File "lxml.etree.pyx", line 1457, in lxml.etree._Element.find (src\lxml\lxml.etree.c:51435) File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 282, in find it = iterfind(elem, path, namespaces) File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 272, in iterfind selector = _build_path_iterator(path, namespaces) File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 256, in _build_path_iterator selector.append(ops[token[0]](_next, token)) File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 134, in prepare_predicate token = next() File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 80, in xpath_tokenizer raise SyntaxError("prefix %r not found in prefix map" % prefix) SyntaxError: prefix 'xml' not found in prefix map </code></pre> 我对此并不感到惊讶，但我希望获得有关如何解决该问题的建议。 谢谢！ 根据要求，一组精简但完整的代码（如果我删除[方括号中的位]，它会按预期工作）： <pre><code>import lxml import codecs file_name = (input('Enter the file name, excluding .xml extension: ') + '.xml')# User inputs file name print('Parsing ' + file_name) #----- Sets up import and namespace from lxml import etree parser = lxml.etree.XMLParser() tree = lxml.etree.parse(file_name, parser) # Name of file to test goes here root = tree.getroot() nsmap = {'xmlns': 'urn:tva:metadata:2012', 'mpeg7': 'urn:tva:mpeg7:2008'} #----- This code writes the output to a file with codecs.open(file_name+'.log', mode='w', encoding='utf-8') as f: # Name the output file f.write(u'CRID|Title|Genre|Rating|Short Synopsis|Medium Synopsis|Long Synopsis\n') for info in root.xpath('//xmlns:ProgramInformation', namespaces=nsmap): titlex = info.find('.//xmlns:Title[xml:lang="PL"]', namespaces=nsmap) # Retreve the title title = titlex.text if titlex != None else 'Missing' # If there isn't a title, print an alternative word f.write(u'{}\n'.format(title)) # Write all the retrieved values to the same line with bar seperators and a new line </code></pre> </question> <answer tick="false" vote="3"> 使用<pre><code>find()</code></pre> <pre><code>xml</code></pre>中的<pre><code>xml:lang</code></pre>前缀不需要在XML文档中声明，但是如果你想在XPath查找中使用<pre><code>xml:lang</code></pre>（使用<pre><code>find()</code></pre>或<pre><code>findall()</code></pre>），你必须定义一个前缀Python 代码中的映射。 <pre><code>xml</code></pre>前缀是保留的（与任意的“正常”命名空间前缀相反）并定义为绑定到<pre><code>http://www.w3.org/XML/1998/namespace</code></pre>。请参阅 <a href="http://www.w3.org/TR/REC-xml-names/#ns-decl" rel="nofollow noreferrer">XML 1.0 中的命名空间</a> W3C 建议。 示例： <pre><code>from lxml import etree # Required mapping when using "find" nsmap = {"xml": "http://www.w3.org/XML/1998/namespace"} XML = """ <root> <Title xml:lang="FR" type="main">Les Tudors</Title> <Title xml:lang="DE" type="main">Die Tudors</Title> <Title xml:lang="IT" type="main">The Tudors</Title> </root>""" doc = etree.fromstring(XML) title_FR = doc.find('Title[@xml:lang="FR"]', namespaces=nsmap) print(title_FR.text) </code></pre> 输出： <pre><code>Les Tudors </code></pre> 如果 <pre><code>xml</code></pre> 前缀没有映射，您会收到“在前缀映射中找不到前缀 'xml'”错误。如果映射到 <code>xml</code><pre> 前缀的 URI 不是 </pre><code>http://www.w3.org/XML/1998/namespace</code><pre>，则上面代码片段中的 </pre><code>find</code><pre> 方法不会返回任何内容。</pre> 使用<code>xpath()</code><pre></pre> 使用<code>xpath()</code><pre>方法，不需要前缀：URI映射：</pre> <code>title_FR = doc.xpath('Title[@xml:lang="FR"]')[0] print(title_FR.text) </code><pre> </pre>输出： <code>Les Tudors </code><pre> </pre> </answer> <answer tick="false" vote="0">如果您可以控制 <code>xml</code><pre> 文件，则应将 </pre><code>xml:lang</code><pre> 属性更改为 </pre><code>lang</code><pre> 。</pre> 或者如果您没有该控制权，我建议在 nsmap 中添加 <code>xml</code><pre>，例如 -</pre> <code>nsmap = {'xmlns': 'urn:tva:metadata:2012', 'mpeg7': 'urn:tva:mpeg7:2008', 'xml': '<namespace>'} </code><pre> </pre> </answer></body>

python xml lxml

回答 0 投票 0

提取没有 xml:lang 属性的元素

我有以下xml文件：福我有以下 xml 文件： <components version="1.0.0"> <component type="foo"> <sample>Foo</sample> <sample xml:lang="a">abc</sample> <sample xml:lang="b">efj</sample> </component> </components> from lxml import etree def parse(path: str): return etree.parse(path) def components(path: str) -> list: components = parse(path).xpath("/components/component") return list(components) def sample(path: str) -> str: sample = components(path)[0].find("sample").text return str(sample) path = "test.xml" print(sample(path)) 我想遍历所有 sample 标签并获取没有 xml:lang 属性的标签的值，即。第一个。我该怎么做呢？我知道我需要使用 for 循环，但不确定如何检查 xml:lang 是否存在。您可以检查lang是否不在标签的属性中： from lxml import etree xml_string = """ <components version="1.0.0"> <component type="foo"> <sample>Foo</sample> <sample lang="a">abc</sample> <sample lang="b">efj</sample> </component> </components> """ root = etree.fromstring(xml_string) for sample in root.findall("component/sample"): if "lang" not in sample.attrib: print(sample.text) 打印： Foo 编辑：如果您有命名空间lang:，您可以尝试： from lxml import etree xml_string = """ <components version="1.0.0"> <component type="foo"> <sample>Foo</sample> <sample xml:lang="a">abc</sample> <sample xml:lang="b">efj</sample> </component> </components> """ root = etree.fromstring(xml_string) for sample in root.findall("component/sample"): # use http://www.w3.org/XML/1998/namespace here # or other Namespace URI found in your document lang = sample.attrib.get(r"{http://www.w3.org/XML/1998/namespace}lang") if not lang: print(sample.text) 您的 xml 片段有一个未关闭的标记，并且属性参数 a 和 b 必须是字符串“a”和“b”。比解析有效，你可以检查 .get('attrib_argument'): from lxml import etree as et xml_str = """<components version="1.0.0"> <component type="foo"> <sample>Foo</sample> <sample lang="a">abc</sample> <sample lang="b">efj</sample> </component> </components> """ root = et.fromstring(xml_str) for elem in root.findall('.//sample'): if elem.get('lang') is not None: pass else: print(f"sample <tag> on list position {root.findall('.//sample').index(elem)} has no 'lang' attrib, Text: {elem.text}") 输出： sample <tag> on list position 0 has no 'lang' attrib, Text: Foo

python xml lxml

回答 2 投票 0

从 xml 文件中提取元素

我有以下xml文件：福 abc 我有以下 xml 文件： <components version="1.0.0"> <component type="foo"> <sample>Foo<sample> <sample lang=a>abc</sample> <sample lang=b>efj</sample> </component> </components> from lxml import etree def parse(path: str): return etree.parse(path) def components(path: str) -> list: components = parse_appinfo_xml(path).xpath("/components/component") return list(components) def sample(path: str) -> str: sample = components(path)[0].find("sample").text return str(sample) path = "test.xml" print(sample(path)) 我想遍历所有 sample 标签并获取没有 lang 属性的标签的值，即。第一个。我该怎么做呢？我知道我需要使用 for 循环，但不确定如何检查 lang 是否存在。您可以检查lang是否不在标签的属性中： from lxml import etree xml_string = """ <components version="1.0.0"> <component type="foo"> <sample>Foo</sample> <sample lang="a">abc</sample> <sample lang="b">efj</sample> </component> </components> """ root = etree.fromstring(xml_string) for sample in root.findall("component/sample"): if "lang" not in sample.attrib: print(sample.text) 打印： Foo

python lxml

回答 1 投票 0

检查 XML 文件中是否存在子元素且非空

我有以下xml文件：我有以下 xml 文件： <?xml version="1.0" encoding="utf-8"?> <components version="1.0.0"> <component type="foo"> <maintag> <subtag> <check>Foo</check> </subtag> <subtag> <check></check> </subtag> <subtag> </subtag> </maintag> </component> </components> 我想检查每个 subtag 元素是否具有非空值的子元素 check。如果出现以下情况，它应该打印错误： check 存在但为空 check 根本不存在于一个或多个 subtag 中我该怎么做？我想出了这个，但它并没有完全达到我想要的效果 from lxml import etree # type: ignore def parse_xml(path: str) -> list: root = etree.parse(path) components = root.xpath("/components/component") return list(components) path = "test.xml" for p in parse_xml(path)[0].iter('check'): if not len(str(p)) > 0: print("check tag empty") 基本上，我的想法是：遍历子标签列表中的每个子标签。找到 check 元素。如果 check_elements 列表为空（即，检查元素不存在）。打印错误消息。否则，获取其文本内容并检查它是否为空或仅包含空格。如果是，则打印错误消息。示例如下： # ...rest for component in components: subtags = component.xpath(".//maintag/subtag") for subtag in subtags: check_elements = subtag.xpath("./check") if not check_elements: print(f"check tag not present at line {subtag.sourceline}") else: check_element = check_elements[0] check_text = check_element.text if not check_text or check_text.strip() == "": print(f"check tag empty at line {check_element.sourceline}")

python lxml

回答 1 投票 0

创建具有多个命名空间和 xsi:type 属性的 XML 文档

如何使用 Python 和 lxml 创建此 XML 结构？如何使用 Python 和 lxml 创建此 XML 结构？ <?xml version="1.0" encoding="utf-8"?> <cfdi:Comprobante xmlns:cfdi="http://www.sat.gob.mx/cfd/4" xmlns:cce11="http://www.sat.gob.mx/ComercioExterior11" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sat.gob.mx/cfd/4 http://www.sat.gob.mx/sitio_internet/cfd/4/cfdv40.xsd http://www.sat.gob.mx/ComercioExterior11 http://www.sat.gob.mx/sitio_internet/cfd/ComercioExterior11/ComercioExterior11.xsd" Version="4.0" Fecha="2023-12-27T11:53:50"> <cfdi:Emisor Rfc="XAXX010101XXX" Nombre="COMPANY" RegimenFiscal="601"/> <cfdi:Receptor Rfc="XEXX010101XXX" Nombre="COMPANY" DomicilioFiscalReceptor="00000" RegimenFiscalReceptor="601" UsoCFDI="G01"/> <cfdi:Conceptos> <cfdi:Concepto ClaveProdServ="00000000" NoIdentificacion="XXXXX" Cantidad="1.000000" ClaveUnidad="EA" Unidad="PIEZA" Descripcion="XXXXX" ValorUnitario="1.00" Importe="1.00" ObjetoImp="00"> <cfdi:Impuestos> <cfdi:Traslados> <cfdi:Traslado Base="1.00" Importe="1.00" Impuesto="000" TipoFactor="Tasa" TasaOCuota="0.000000"/> </cfdi:Traslados> </cfdi:Impuestos> </cfdi:Concepto> </cfdi:Conceptos> <cfdi:Impuestos TotalImpuestosTrasladados="1.00"> <cfdi:Traslados> <cfdi:Traslado Base="1.00" Importe="1.00" Impuesto="000" TipoFactor="Tasa" TasaOCuota="0.000000"/> <cfdi:Traslado Base="1.00" Importe="1.00" Impuesto="000" TipoFactor="Tasa" TasaOCuota="0.000000"/> </cfdi:Traslados> </cfdi:Impuestos> </cfdi:Comprobante> 此结构用于墨西哥发票。使用 Powershell 脚本。我对这些值进行了硬编码。我也硬编码了总数。 using assembly System.Xml.Linq $filename = 'c:\temp\test.xml' $emisorRfc = 'XAXX010101XXX' $receptorRfc = 'XEXX010101XXX' $nombre = 'COMPANY' $regimenFiscal = '601' $domicilioFiscalReceptor = '00000' $regimenFiscalReceptor = '601' $usoCFDI = 'G01' $claveProdServ = '00000000' $noIdentificacion = 'XXXXX' $cantidad = '1.000000' $claveUnidad = 'EA' $unidad = 'PIEZA' $descripcion = 'XXXXX' $valorUnitario = '1.00' $importe = '1.00' $objetoImp = '00' $base = '1.00' $importe = '1.00' $impuesto = '000' $tipoFactor = 'Tasa' $tasaOCuota = '0.000000' $ident = @' <?xml version="1.0" encoding="utf-8"?> <cfdi:Comprobante xmlns:cfdi="http://www.sat.gob.mx/cfd/4" xmlns:cce11="http://www.sat.gob.mx/ComercioExterior11" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sat.gob.mx/cfd/4 http://www.sat.gob.mx/sitio_internet/cfd/4/cfdv40.xsd http://www.sat.gob.mx/ComercioExterior11 http://www.sat.gob.mx/sitio_internet/cfd/ComercioExterior11/ComercioExterior11.xsd" Version="4.0" Fecha="2023-12-27T11:53:50"> </cfdi:Comprobante> '@ $xDoc = [System.Xml.Linq.XDocument]::Parse($ident) $root = $xDoc.Root $nscfdi = $root.GetNamespaceOfPrefix('cfdi') $nscce11 = $root.GetNamespaceOfPrefix('cce11') $nsxsi = $root.GetNamespaceOfPrefix('xsdi') $emisor = [System.Xml.Linq.XElement]::new([System.Xml.Linq.XName]::Get($nscfdi + 'Emisor')) $emisorRfcAttr = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('RFC'), $emisorRfc) $emisor.Add($emisorRfcAttr) $emisorNombre = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('Nombre'), $nombre) $emisor.Add($emisorNombre) $emisorRegimen = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('RegimenFiscal'), $regimenFiscal) $emisor.Add($emisorRegimen) $root.Add($emisor) $receptor = [System.Xml.Linq.XElement]::new([System.Xml.Linq.XName]::Get($nscfdi + 'Receptor')) $receptorRfcAttr = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('RFC'), $receptorRfc) $receptor.Add($receptorRfcAttr) $receptor.Add($emisorNombre) $receptorDomicilioFiscal = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('DomicilioFiscalReceptor'), $domicilioFiscalReceptor) $receptor.Add($receptorDomicilioFiscal) $receptorRegimenFiscal = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('RegimenFiscalReceptor'), $regimenFiscalReceptor) $receptor.Add($receptorRegimenFiscal) $receptorUsoCFDI = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('UsoCFDI'), $usoCFDI) $receptor.Add($receptorUsoCFDI) $root.Add($receptor) $conceptos = [System.Xml.Linq.XElement]::new([System.Xml.Linq.XName]::Get($nscfdi + 'Conceptos')) $concepto = [System.Xml.Linq.XElement]::new([System.Xml.Linq.XName]::Get($nscfdi + 'Concepto')) $claveProdServAttr = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('ClaveProdServ'), $claveProdServ) $concepto.Add($claveProdServAttr) $noIdentificacionAttr = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('NoIdentificacion'), $noIdentificacion) $concepto.Add($noIdentificacionAttr) $cantidadAttr = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('Cantidad'), $cantidad) $concepto.Add($cantidadAttr) $claveUnidadAttr = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('ClaveUnidad'), $claveUnidad) $concepto.Add($claveUnidadAttr) $unidadAttr = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('Unidad'), $unidad) $concepto.Add($unidadAttr) $descripcionAttr = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('Descripcion'), $descripcion) $concepto.Add($descripcionAttr) $valorUnitarioAttr = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('ValorUnitario'), $valorUnitario) $concepto.Add($valorUnitarioAttr) $importeAttr = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('Importe'), $importe) $concepto.Add($importeAttr) $objetoImpAttr = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('ObjetoImp'), $objetoImp) $concepto.Add($objetoImpAttr) $impuestos = [System.Xml.Linq.XElement]::new([System.Xml.Linq.XName]::Get($nscfdi + 'Impuestos')) $traslados = [System.Xml.Linq.XElement]::new([System.Xml.Linq.XName]::Get($nscfdi + 'Traslados')) $traslado = [System.Xml.Linq.XElement]::new([System.Xml.Linq.XName]::Get($nscfdi + 'Traslado')) $baseAttr = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('Base'), $base) $traslado.Add($baseAttr) $importeAttr = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('Importe'), $importe) $traslado.Add($importeAttr) $impuestoAttr = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('Impuesto'), $impuesto) $traslado.Add($impuestoAttr) $tipoFactorAttr = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('TipoFactor'), $tipoFactor) $traslado.Add($tipoFactorAttr) $tasaOCuotaAttr = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('TasaOCuota'), $tasaOCuota) $traslado.Add($tasaOCuotaAttr) $traslados.Add($traslado) $impuestos.Add($traslados) $concepto.Add($impuestos) $conceptos.Add($concepto) $root.Add($conceptos) $impuestos = [System.Xml.Linq.XElement]::new([System.Xml.Linq.XName]::Get($nscfdi + 'Impuestos')) $totalImpuestosTrasladados = '1.00' $totalImpuestosTrasladadosAttr = [System.Xml.Linq.XAttribute]::new([System.Xml.Linq.XName]::Get('TotalImpuestosTrasladados'), $totalImpuestosTrasladados) $impuestos.Add($totalImpuestosTrasladadosAttr) $root.Add($impuestos) $xDoc.Save($filename)

python xml lxml xml-namespaces

回答 1 投票 0

BS4：从BeautifulSoup对象获取lxml etree

如果我将 BeautifulSoup4 与 lxml 解析器一起使用，如何从 BeautifulSoup 对象获取 lxml etree 对象？我会用它通过 XPath 查找元素。 BeautifulSoup4 本身不支持 XPath

web-scraping beautifulsoup lxml

回答 1 投票 0

lxml解析xml，缺少根错误

我正在尝试解析一个xml文件，以便我可以操作其中包含的数据。它有 900 万行，所以我不会发布它。这是我的代码：从 lxml 导入 etree 解析器 = etree.XMLPar...

python xml xml-parsing lxml utf-16

回答 1 投票 0

使用lxml从html中提取属性

我使用 lxml 从 html 页面检索标签的属性。 html 页面的格式如下： ... 我使用 lxml 从 html 页面检索标签的属性。 html 页面的格式如下： <div class="my_div"> <a href="/foobar"> <img src="my_img.png"> </a> </div> 我用来检索 <a> 标签内的 url 以及同一 src 内 <img> 标签的 <div> 值的 python 脚本是这样的： from lxml import html ... tree = html.fromstring(page.text) for element in tree.xpath('//div[contains(@class, "my_div")]//a'): href = element.xpath('/@href') src = element.xpath('//img/@src') 为什么我拿不到琴弦？您正在使用 lxml，因此您正在使用 lxml 对象 - HtmlElement 实例进行操作。 HtmlElement 嵌套在 etree.Element 中：http://lxml.de/api/lxml.etree._Element-class.html，它有 get 方法，返回属性值。所以适合你的方法是： from lxml import html ... tree = html.fromstring(page.text) for link_element in tree.xpath('//div[contains(@class, "my_div")]//a'): href = link_element.get('href') image_element = href.find('img') if image_element: img_src = image_element.get('src') 如果您将代码更改为： from lxml import html ... tree = html.fromstring(page.text) for element in tree.xpath('//div[contains(@class, "my_div")]//a'): href = element.items()[0][1] #gives you the value corresponding to the key "href" src = element.xpath('//img/@src')[0] print(href, src) 你会得到你需要的。 lxml的文档提到了其中一些内容，但我觉得它缺少一些内容，您可能需要考虑使用交互式Python shell来研究tree.xpath()返回的实例的属性。或者你可以完全研究另一个解析器，例如 BeautifulSoup，它有非常好的示例和文档。只是分享... 您没有获得想要的结果的原因是因为您试图从 NEXT 子节点而不是现有节点获取属性。看这个： from lxml import html s = '''<div class="my_div"> <a href="/foobar"> <img src="my_img.png"> </a> </div>''' tree = html.fromstring(s) # when you do path... //a, you are ALREADY at 'a' node for el in tree.xpath('//div[contains(@class, "my_div")]//a'): # you were trying to get next children /@href, which doesn't exist print el.xpath('@href') # you should instead access the existing node's print el.xpath('img/@src') # same here, not /img/@src ... ['/foobar'] ['my_img.png'] 希望这有帮助。

python html lxml

回答 3 投票 0

lxml 忽略特定标签之间的任何标签

我正在尝试从一个巨大的 xml 文件中提取一些特定字段。这是一个例子： <

python xml lxml

回答 1 投票 0

lxml 元素的文本未显示

我在从 xml 文件中提取文本时遇到问题，但我的代码不允许我获得我期望获得的内容。这是代码的一部分。根 = etree.fromstring(xml) 标题 = root.findall('

python xml parsing lxml elementtree

回答 1 投票 0

python lxml 添加/修改/替换 html 元素的innerHTML

我正在使用 lxml 来解析示例 html。像这样：导入lxml.html __dom = lxml.html.fromstring("") ...