beautifulsoup 相关问题

我需要找到多个包含特定文本的数据（10.13.18.150，StreaNetwork）我需要找到多个<tr>包含特定文本的<td>数据（10.13.18.150，StreaNetwork） <tr id="fr119" onclick="fr_toggle(119)" ondblclick="document.location='firewall_rules_edit.php?id=120';" class="ui-sortable-handle" style=""> <td> <input type="checkbox" id="frc119" onclick="fr_toggle(119)" name="rule[]" value="120"> </td> <td title="traffic is passed"> <a href="?if=lan&act=toggle&id=120" usepost=""> <i class="fa fa-check text-success" title="click to toggle enabled/disabled status"></i> </a> <i class="fa fa-cog" title="advanced setting: gateway PeakJioAirtel " style="cursor: pointer;"></i> </td> <td> 10.13.18.150 </td> <td> StreaNetwork </td> </tr> 我的代码： from bs4 import BeautifulSoup complete_soup = BeautifulSoup(html_data, 'html.parser') complete_soup.find('tr:has(td:contains("StreaNetwork"))') 这个问题有什么解决办法吗？查找所有 tr 而不是检查它是否包含 StreaNetwork TD 运行所有TR的循环并获取所有TD标签运行所有 TD 的循环并获取其文本以检查它是否包含字符串尝试将 find() 调整为 select() 以使用 css selectors，您的脚本将抓取 <tr>。此外，如果内容仍在该元素中，则迭代 Resultset 并打印 :nth-child(3)： complete_soup = BeautifulSoup(html_data, 'html.parser') for e in complete_soup.select('tr:has(td:contains("StreaNetwork"))'): print(e.select_one(':nth-child(3)').get_text(' ',strip=True))

python-3.x web-scraping beautifulsoup

回答 2 投票 0

如何取消Beautiful Soup中标签的嵌套？

我有一个与此类似的 html 文档：标题 ... 我有一个与此类似的 html 文档： <div> <h2>Title</h2> <div> <div> <div> <img alt="Some image" src="blah.gif"/> </div> </div> </div> 我想将它提取出来最终看起来像这样（即删除空的嵌套 div） <h2>Title</h2> <div> <img alt="Some image" src="blah.gif"/> </div> 如果外部 div 包含某些内容，我不介意保留它，但我想删除任何不必要的嵌套内容。澄清一下，当我有一个 div 时，它包含另一个 div ，仅此而已，然后我想删除（展开）内部 div，即所以代替： div>div>div>div>div>img 我只想： div>img 这是我写的POC，欢迎对代码提出任何建议。您可以向函数test添加条件，它将递归地查找元素匹配条件并删除最外层。 from bs4 import BeautifulSoup mytext =""" <div> <h2> At least he didn't go in for the hug. </h2> <div> <div> <div> <img alt="At least he didn't go in for the hug." src="handshake-fails-are-embarrassing\9lmzspj.gif"/> </div> </div> </div> """ soup = BeautifulSoup(mytext) def test(x): children = x.find_all(recursive=False) try: # only one child cri_1 = (len(children) == 1) # same name as its child cri_2 = (children[0].name == x.name) # no attribute but tag name cri_3 = (len(x.attrs) == 0) return cri_1 and cri_2 and cri_3 except: return False while soup.find_all(lambda x: test(x)): elements = soup.find_all(lambda x: test(x)) elements[0].unwrap() print soup.prettify() 输出： <html> <body> <div> <h2> At least he didn't go in for the hug. </h2> <div> <img alt="At least he didn't go in for the hug." src="handshake-fails-are-embarrassing\9lmzspj.gif"/> </div> </div> </body> </html>

python beautifulsoup

回答 1 投票 0

从网站上抓取纬度和经度

我想使用此网站的数据将邮政编码列表转换为纬度和经度的 DataFrame：免费地图工具。 https://www.freemaptools.com/convert-us-zip-code-to-lat-lng.htm#

python-3.x web-scraping beautifulsoup python-requests

回答 1 投票 0

使用谷歌新闻RSS链接抓取文章数据时如何处理谷歌同意页面？

我有一个来自 google RSS feed 的 google 新闻链接列表，我想获取这些文章的全文。我使用 BeautifulSoup 库来抓取数据，但是，谷歌似乎重定向了......

python web-scraping beautifulsoup rss

回答 1 投票 0

无法产生包含我希望获得的地址的结果

我正在尝试使用此网站的 requests 模块和 BeautifulSoup 库创建一个脚本，该脚本将执行以下操作：选择Strata plan number按钮，在输入框中输入11，...

python python-3.x web-scraping beautifulsoup python-requests

回答 1 投票 0

如何使用 BeautifulSoup 和 pandas 从维基百科中提取表格

我正在尝试从维基百科页面中提取表格并将其显示在 pandas DataFrame 中。这是我的代码：从 bs4 导入 BeautifulSoup 导入请求将 pandas 导入为 pd url = "https://en.

python pandas web-scraping beautifulsoup

回答 1 投票 0

使用 BeatifulSoup 提取下拉菜单的项目列表

我已经尝试了很多方法，但事实证明这个网站很难通过 bs4 抓取。这是网址：https://www.nseindia.com/option-chain 我正在尝试提取...

web-scraping beautifulsoup

回答 1 投票 0

无法使用请求模块从静态网页中抓取不同的公司名称

我创建了一个脚本来使用请求模块从该网站收集不同的公司名称，但是当我执行该脚本时，它最终什么也没得到。我查找了我的公司名称...

python python-3.x web-scraping beautifulsoup python-requests

回答 3 投票 0

用 beautifulsoup 抓取歌词

使用 Genius API，我获取了歌词页面的歌曲 url。我现在想使用 beautifulsoup4 进行网络爬虫；但是，我遇到了一个错误。这是代码：从 bs4 导入 BeautifulSoup 导入请求...

python html beautifulsoup

回答 4 投票 0

从 json python 请求中抓取值

因此，我正在网站上构建一个尺寸抓取工具，但我很困惑如何从该 JSON 中提取“EUR”和“pieces”。我想打印所有尺寸，例如“EU 41 = Pieces 6”。

python json web-scraping beautifulsoup python-requests

回答 1 投票 0

beautifulsoup 相关问题

最新问题