lxml 相关问题

lxml是一个功能齐全的高性能Python库,用于处理XML和HTML。

Python lxml 解析器不会返回 <p> 元素的整个文本(如果其中有 <xref>)继续教育活动

我正在尝试使用 lxml 从 .xml 格式文章中的所有 元素中提取文本。这是文章的示例: 继续教育活动 <... 我正在尝试使用 lxml 从 .xml 格式文章中的所有 <p> 元素中提取文本。这是文章的示例: <title>Continuing Education Activity</title> <p>Ebstein anomaly is a rare congenital heart disease that involves the apical displacement of the tricuspid valve with adherence of the septal and posterior leaflets to the myocardium and &#x00022;atrialization of the inlet portion of the right ventricle&#x00022;. It is usually accompanied by tricuspid regurgitation, right ventricular failure, and arrhythmias. Clinical manifestations range from asymptomatic to severe, depending on the degree of tricuspid valve displacement and severity of regurgitation, the effective right ventricular volume, and the associated malformations (i.e., pulmonary valve stenosis, atresia, atrial septal defect, etc.). Arrhythmias are common and protracted due to the likelihood of having accessory pathways, in addition to having right atrial dilatation. Symptomatic patients can present with cyanosis, congestive heart failure, and arrhythmias, with exertional dyspnea being common in older patients. This activity reviews the pathophysiology and presentation of Ebstein's malformation and highlights the role of the interprofessional team in its management.</p> <p> <bold>Objectives:</bold> <list list-type="bullet"><list-item><p>Describe the pathophysiology of Ebstein anomaly.</p></list-item><list-item><p>Review the clinical presentation of a patient with an Ebstein anomaly.</p></list-item><list-item><p>Outline the approach to the management of patients with Ebstein anomaly, including the indications for non-surgical and surgical intervention.</p></list-item><list-item><p>Summarize the importance of improving care coordination among interprofessional team members to improve outcomes for patients affected by Ebstein anomaly.</p></list-item></list> <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://www.statpearls.com/account/trialuserreg/?articleid=20850&#x00026;utm_source=pubmed&#x00026;utm_campaign=reviews&#x00026;utm_content=20850">Access free multiple choice questions on this topic.</ext-link> </p> </sec> <sec id="article-20850.s2" sec-type="pubmed-excerpt"> <title>Introduction</title> <p>Ebstein anomaly is a rare congenital abnormality involving the tricuspid valve and the right ventricle (RV),<xref ref-type="bibr" rid="article-20850.r1">[1]</xref>&#x000a0;with an incidence of &#x0003c;1% of congenital heart defects.<xref ref-type="bibr" rid="article-20850.r2">[2]</xref>&#x000a0;It was first described by the pathologist Wilhelm Ebstein in 1866 when he performed an autopsy of a 19-year-old cyanotic&#x000a0;male who had suffered from exertional dyspnea and palpitations and died of a sudden cardiac arrest.<xref ref-type="bibr" rid="article-20850.r3">[3]</xref>&#x000a0;Ebstein anomaly&#x000a0;is defined by the following characteristics:</p> 注意最后一个 <p> 元素如何散布 <xref> 元素作为引文。当我使用以下Python代码提取文本时: import lxml def extract_text(filename): chunks = [] tree = etree.parse('./data/statpearls_NBK430685/' + filename) root = tree.getroot() p_tags = tree.findall('.//p') # list_tags = tree.findall('.//list') # whenever there's a list, include the para above as well as context. for p in p_tags: if p.text is None: continue elif not any(char.isalpha() for char in p.text): # check that there are some alphabetical characters and ignore if there aren't continue chunks.append(p.text) return chunks extract_text('article-20850.nxml') 这是输出: ['Ebstein anomaly is a rare congenital heart disease that involves the apical displacement of the tricuspid valve with adherence of the septal and posterior leaflets to the myocardium and "atrialization of the inlet portion of the right ventricle". It is usually accompanied by tricuspid regurgitation, right ventricular failure, and arrhythmias. Clinical manifestations range from asymptomatic to severe, depending on the degree of tricuspid valve displacement and severity of regurgitation, the effective right ventricular volume, and the associated malformations (i.e., pulmonary valve stenosis, atresia, atrial septal defect, etc.). Arrhythmias are common and protracted due to the likelihood of having accessory pathways, in addition to having right atrial dilatation. Symptomatic patients can present with cyanosis, congestive heart failure, and arrhythmias, with exertional dyspnea being common in older patients. This activity reviews the pathophysiology and presentation of Ebstein\'s malformation and highlights the role of the interprofessional team in its management.', 'Describe the pathophysiology of Ebstein anomaly.', 'Review the clinical presentation of a patient with an Ebstein anomaly.', 'Outline the approach to the management of patients with Ebstein anomaly, including the indications for non-surgical and surgical intervention.', 'Summarize the importance of improving care coordination among interprofessional team members to improve outcomes for patients affected by Ebstein anomaly.', 'Ebstein anomaly is a rare congenital abnormality involving the tricuspid valve and the right ventricle (RV),'] 最后一块完全丢失了 <xref> 标签之后的所有文本。有人知道是什么原因导致这种行为以及如何防止这种情况吗? 我建议使用beautifulsoup库来解析这个HTML/XML混合文件: from bs4 import BeautifulSoup text = """\ <title>Continuing Education Activity</title> <p>Ebstein anomaly is a rare congenital heart disease that involves the apical displacement of the tricuspid valve with adherence of the septal and posterior leaflets to the myocardium and &#x00022;atrialization of the inlet portion of the right ventricle&#x00022;. It is usually accompanied by tricuspid regurgitation, right ventricular failure, and arrhythmias. Clinical manifestations range from asymptomatic to severe, depending on the degree of tricuspid valve displacement and severity of regurgitation, the effective right ventricular volume, and the associated malformations (i.e., pulmonary valve stenosis, atresia, atrial septal defect, etc.). Arrhythmias are common and protracted due to the likelihood of having accessory pathways, in addition to having right atrial dilatation. Symptomatic patients can present with cyanosis, congestive heart failure, and arrhythmias, with exertional dyspnea being common in older patients. This activity reviews the pathophysiology and presentation of Ebstein's malformation and highlights the role of the interprofessional team in its management.</p> <p> <bold>Objectives:</bold> <list list-type="bullet"><list-item><p>Describe the pathophysiology of Ebstein anomaly.</p></list-item><list-item><p>Review the clinical presentation of a patient with an Ebstein anomaly.</p></list-item><list-item><p>Outline the approach to the management of patients with Ebstein anomaly, including the indications for non-surgical and surgical intervention.</p></list-item><list-item><p>Summarize the importance of improving care coordination among interprofessional team members to improve outcomes for patients affected by Ebstein anomaly.</p></list-item></list> <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://www.statpearls.com/account/trialuserreg/?articleid=20850&#x00026;utm_source=pubmed&#x00026;utm_campaign=reviews&#x00026;utm_content=20850">Access free multiple choice questions on this topic.</ext-link> </p> </sec> <sec id="article-20850.s2" sec-type="pubmed-excerpt"> <title>Introduction</title> <p>Ebstein anomaly is a rare congenital abnormality involving the tricuspid valve and the right ventricle (RV),<xref ref-type="bibr" rid="article-20850.r1">[1]</xref>&#x000a0;with an incidence of &#x0003c;1% of congenital heart defects.<xref ref-type="bibr" rid="article-20850.r2">[2]</xref>&#x000a0;It was first described by the pathologist Wilhelm Ebstein in 1866 when he performed an autopsy of a 19-year-old cyanotic&#x000a0;male who had suffered from exertional dyspnea and palpitations and died of a sudden cardiac arrest.<xref ref-type="bibr" rid="article-20850.r3">[3]</xref>&#x000a0;Ebstein anomaly&#x000a0;is defined by the following characteristics:</p> """ soup = BeautifulSoup(text, "html.parser") # remove <xref> to not appear in text for xref in soup.select("xref"): xref.extract() for p in soup.select("p"): print(p.get_text(strip=True, separator=" ")) print("-" * 80) 打印: Ebstein anomaly is a rare congenital heart disease that involves the apical displacement of the tricuspid valve with adherence of the septal and posterior leaflets to the myocardium and "atrialization of the inlet portion of the right ventricle". It is usually accompanied by tricuspid regurgitation, right ventricular failure, and arrhythmias. Clinical manifestations range from asymptomatic to severe, depending on the degree of tricuspid valve displacement and severity of regurgitation, the effective right ventricular volume, and the associated malformations (i.e., pulmonary valve stenosis, atresia, atrial septal defect, etc.). Arrhythmias are common and protracted due to the likelihood of having accessory pathways, in addition to having right atrial dilatation. Symptomatic patients can present with cyanosis, congestive heart failure, and arrhythmias, with exertional dyspnea being common in older patients. This activity reviews the pathophysiology and presentation of Ebstein's malformation and highlights the role of the interprofessional team in its management. -------------------------------------------------------------------------------- Objectives: Describe the pathophysiology of Ebstein anomaly. Review the clinical presentation of a patient with an Ebstein anomaly. Outline the approach to the management of patients with Ebstein anomaly, including the indications for non-surgical and surgical intervention. Summarize the importance of improving care coordination among interprofessional team members to improve outcomes for patients affected by Ebstein anomaly. Access free multiple choice questions on this topic. -------------------------------------------------------------------------------- Describe the pathophysiology of Ebstein anomaly. -------------------------------------------------------------------------------- Review the clinical presentation of a patient with an Ebstein anomaly. -------------------------------------------------------------------------------- Outline the approach to the management of patients with Ebstein anomaly, including the indications for non-surgical and surgical intervention. -------------------------------------------------------------------------------- Summarize the importance of improving care coordination among interprofessional team members to improve outcomes for patients affected by Ebstein anomaly. -------------------------------------------------------------------------------- Ebstein anomaly is a rare congenital abnormality involving the tricuspid valve and the right ventricle (RV), with an incidence of <1% of congenital heart defects. It was first described by the pathologist Wilhelm Ebstein in 1866 when he performed an autopsy of a 19-year-old cyanotic male who had suffered from exertional dyspnea and palpitations and died of a sudden cardiac arrest. Ebstein anomaly is defined by the following characteristics: --------------------------------------------------------------------------------

回答 1 投票 0

为什么我在安装了库的 Azure DevOps 管道中收到“ModuleNotFoundError:没有名为 'lxml' 的模块”错误

我正在尝试运行一个简单的 python 脚本,该脚本在本地运行良好,但在 DevOps 管道中继续遇到相同的错误。我已将库的安装包含在 yaml 文件中并在

回答 1 投票 0

如何在lxml中使用XPath忽略非内容元素

我正在尝试处理一堆 XML 文件,并在满足某些条件时向特定元素添加某些属性。我有相同 XML 文档的不同版本。其中一些有...

回答 1 投票 0

在Python中剥离lxml根标签

给定示例country.xml 文件,我希望将每个国家/地区复制到新的output.xml 文件,作为新根的子元素。问题是当我附加每个国家/地区时,我会得到重复的

回答 1 投票 0

lxml HtmlElement 属性的结构模式匹配

我想使用 PEP 634 – 结构模式匹配来匹配具有特定属性的 HtmlElement。这些属性可通过 .attrib 属性访问,该属性返回

回答 1 投票 0

使用lxml解析包含重复元素的文件

我正在尝试处理 GDML 文件,这不是我所说的平面结构。 而不是 具有单个元件 我...

回答 1 投票 0

Mac OS X 10.9 上的 Python3、lxml 和“未找到符号:_lzma_auto_decoder”

我使用homebrew安装了python 3,然后安装了pip3和lxml。 以下行 从 lxml 导入主菜 导致以下错误: $ python3 Python...

回答 5 投票 0

AWS Lambda Python 3.11:无法导入 lxml:libxslt.so.1:无法打开共享对象文件:没有这样的文件或目录

我在 AWS Lambda 上有一个依赖于 lxml 的 Python 函数。依赖层包含诗歌安装lxml的结果,但我在运行时收到以下错误: “错误消息”:&

回答 1 投票 0

XML 格式问题

我是第一次使用 Salesforce SOAP API,所以我不熟悉 SOAP 格式问题等。我使用 lxml 库生成 XML,但似乎有格式问题...

回答 3 投票 0

我正在尝试安装Scrapy;但是,这是我遇到的错误: Failed Building Wheel for lxml 。请帮忙

遇到错误 lxml 构建轮子失败 src/lxml/etree.c:96:10:致命错误:找不到“Python.h”文件 #include“Python.h” ^~~~~~~~~~ 生成 1 个错误。 错误:无法构建...

回答 2 投票 0

使用 lxml 和 django/python - 列表索引超出范围

我有一个小问题。我正在尝试使用 lxml 从 XML 中提取一些数据,但一直收到“列表索引超出范围”错误,现在我正在尝试获取列表的 [0] 位置,这应该...

回答 1 投票 0

如何修复:引发 ImportError("lxml 未找到,请安装它")

我目前在 Pythonanywhere 上托管我的 python Flask 应用程序。 当我运行我的抓取脚本时,它使用代码 df = pd.read_html(当前数据.内容) 我收到标题中发现的错误。 跑步...

回答 1 投票 0

<?xml version=“1.0” encoding=“UTF-8”?> 不是<?xml version='1.0' encoding='UTF-8'?>

我正在使用 lxml tree.write(xmlFileOut, Pretty_print = True, xml_declaration = True, 编码='UTF-8' 写出我打开和编辑的 xml 文件,但我绝对需要 xml 声明...

回答 3 投票 0

如何获取lxml中元素的路径?

我正在使用Python中的lxml中的XPath在HTML文档中进行搜索。如何获取某个元素的路径?这是 ruby nokogiri 的示例: page.xpath('//text()').each 做 |textnode| ...

回答 4 投票 0

如何在Cygwin下的Python 3.8中安装lxml?

我一直在尝试在Cygwin上使用pip install安装Python3.8下的cython和lxml包。然而,这会反复失败,并出现从 python 错误到 gcc 错误等难以理解的错误

回答 2 投票 0

如何在迭代编写时强制缩进 python LXML xml 元素嵌套?

我正在使用 LXML 编写一个 xml 文件,该文件是数据库的转储。 鉴于数据的大小,我必须反复编写 xml 文件。将 etree 转储到文件时,服务器内存不足

回答 0 投票 0

循环不是抓取多个页面,只是重复从一个页面返回数据

进口请求 从 bs4 导入 BeautifulSoup 将熊猫导入为 pd headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari...

回答 0 投票 0

openpyxl:在没有 zipfile 的情况下获取工作表的 xml 源代码

从 openpyxl 导入 load_workbook wb = load_workbook('file.xlsx') ws = wb['Sheet1'] 有没有办法检索表示 ws 对象的 xml 代码? 注意:我想避免使用 zipfile ...

回答 1 投票 0

关于美汤4模块的问题

我很困惑,因为这段代码有时有效,有时无效。该代码基于美丽的汤模块。我想知道为什么它在某些情况下有效以及为什么它在其他情况下无效...

回答 2 投票 0

不能在带有 lxml etree 的 xpath 中使用 translate() 方法

我想使用 Python 中的 lxml 库翻译来降低我的文本。我的代码如下 r = element.xpath('./a/translate(text(), "A", "a")') 但它给了我一个例外: lxml...

回答 1 投票 0

© www.soinside.com 2019 - 2024. All rights reserved.