html-parsing 相关问题

HTML解析是消耗HTML文档的序列化并产生可以以编程方式工作的表示的过程 - 例如，为了从中提取数据。 HTML规范定义了用于解析HTML的标准算法，该算法在所有主流浏览器中实现。

我有这个代码例如：我有这个代码例如： <div class = "las"> <div class = "asas"> <table style="width:100%"> <tr> <th>Firstname</th> <th>Lastname</th> <th>Age</th> </tr> <tr> <td>Jill</td> <td>Smith</td> <td>50</td> </tr> <tr> <td>Eve</td> <td>Jackson</td> <td>94</td> </tr> <tr> <td>John</td> <td>Doe</td> <td>80</td> </tr> </table> </div class = "las"> </div class = "asas"> 我已经将它保存在名为“code”的变量中，我如何访问<td>Smith</td>标签，例如：code[0][0][1][1]。我使用 Beautiful Soup，我知道迭代嵌套标签的唯一方法是使用 .parents 和 .children，这变得非常混乱

python beautifulsoup html-parsing

回答 0 投票 0

使用 ParseDelegator 查找输入或其子项中第一次出现的 /wiki/Geographic_coordinate_system

输入例如Linux时，它会寻找/wiki/Geographic_coordinate_system的第一个实例，如果没有找到，它会查看它的直接子级来找到它。我的预期输出是

java html-parsing wikipedia

回答 0 投票 0

将字符串输入解析为角度 2

我正在构建一个角度应用程序，我需要从输入类型范围中获取值并将它们用于计算并显示结果。我正在获取要显示在另一部分的输入范围值...

html angular typescript input html-parsing

回答 2 投票 0

如何使用正则表达式只匹配<h></h>标签之间的标题，而不返回标签本身？

我想匹配 HTML 文件中 h1 到 h6 的标题，而不返回 h 标签本身，使用正则表达式。考虑以下 HTML 文件。我想匹配“Welco ...

regex html-parsing regex-lookarounds non-greedy

回答 1 投票 0

在给定位置检测上下文的 HTML 解析器

我想编写一个程序，通过提供的 HTML 和 HTML 中的字符位置，返回所提供位置所属的上下文。例如，对于以下 HTML：我想编写一个程序，通过提供的 HTML 和 HTML 中的字符位置，返回所提供位置所属的上下文。例如，对于以下 HTML： <!DOCTYPE html> <html lang="en"> <head> <title>Hello, world!</title> </head> <body> <h1 style="[1]">Hello, world!</h1> <p>hello [2] world</p> <script> var a = "hello[3]"; [4] </script> </body> </html> 对于 [1] 表示的位置，程序将返回 html-attribute，对于位置 [2] - html-content，对于 [3] - script-string，以及对于 [4] - script-other. 启发我的是 C# 中的一个 XmlReader 类。任何编程语言都适合我，尽管我更喜欢 Ruby。我也希望程序高效（例如，避免创建 HTML 的完整繁重的 DOM 结构）我不想从头开始写程序；相反，我想使用现有的库/模块。如果有人能帮助我，我会很高兴。

python java node.js ruby html-parsing

回答 0 投票 0

如何从HTML中提取特定数据？

我需要从 HTML 中提取以下数据：日期、时间、所有四个“Jogos”时间以及是否在第一次、第二次、第三次或第四次出现复选标记符号。但是代码不是

python-3.x beautifulsoup html-parsing

回答 2 投票 0

VBA中HTML解析getElementsByTagName的后期绑定是什么？

给定以下 HTML 代码： blabla 我想在 VBA 中使用 getElementsByTagName 来获得结果 blabla。我一直在尝试四处搜索，但我看到的所有帖子都...

vba html-parsing

回答 0 投票 0

Python 脚本（BeautifulSoup）返回 NoneType

尝试解析同义词和反义词的 Merriam-Webster 词库条目的源代码。以下是使用种类一词的源代码示例：查看源代码试图提取syno ...

python html beautifulsoup html-parsing meta

回答 0 投票 0

PHP - 替换 <img> 标签并返回 src

任务是用标签和src属性替换给定字符串中的所有标签作为内部文本。在寻找答案时，我发现了类似的问题任务是用<img>标签和<div>属性替换给定字符串中的所有src标签作为内部文本。在寻找答案时，我找到了similar question <?php $content = "this is something with an <img src=\"test.png\"/> in it."; $content = preg_replace("/<img[^>]+\>/i", "(image) ", $content); echo $content; ?> 结果： this is something with an (image) in it. 问题：如何升级script ant得到这个结果： this is something with an <div>test.png</div> in it. 这是PHP的DOMDocument类擅长的问题： $dom = new DOMDocument(); $dom->loadHTML($content); foreach ($dom->getElementsByTagName('img') as $img) { // put your replacement code here } $content = $dom->saveHTML(); $content = "this is something with an <img src=\"test.png\"/> in it."; $content = preg_replace('/(<)([img])(\w+)([^>]*>)/', '<div>$1</div>', $content); echo $content; <?php $strings = 'awdaw <img src="http://ua1.us/media/media.jpg" alt="Image" width="100" height="100"> aw <img src="http://ua1.us/media/media1awdwa.jpg"> wawadwad'; preg_match_all('/<img[^>]+>/i', $strings, $images); foreach ($images[0] as $image) { preg_match('/src="([^"]+)/i', $image, $replacements); $replacement = isset($replacements[1]) ? $replacements[1] : (isset($replacements[0]) ? $replacements[0] : "image"); $strings = str_replace($image, $replacement, $strings); } echo $strings;

php regex html-parsing

回答 3 投票 0

使用 Beautiful Soup 4 解析 HTML 时无法让循环工作

我正在使用 Beautiful Soup 文档来帮助我了解如何实施它。我对整个 Python 不太熟悉，所以也许我犯了语法错误，但我不这么认为。代码

python parsing beautifulsoup html-parsing

回答 1 投票 0

Selenium无法找到一个类

我在使用selenium时遇到了一个定位类的问题，我已经尝试了所有我可以尝试的方法来成功地定位类的属性并执行一些操作，比如：driver......

python selenium html-parsing

回答 1 投票 0

在HTML解析器中选择含有附加词的类名。

我想搜刮一个网页。我想得到评论。但是评论有三类，有的是正面的，有的是中性的，有的是负面的。我使用的是html解析器，并且访问了...