Python BeautifulSoup findAll格式

问题描述 投票:0回答:1

我正在尝试抓取一些html文件以生成该html文件中显示的信息的机器可读表。

soup = BeautifulSoup(open('/path/to/some/html/file/M32.html'), 'html.parser')
search = soup.findAll('a')
print(search)

这导致:

[<a href="https://antismash.secondarymetabolites.org/">
<img alt="antiSMASH logo" src="images/bacteria_antismash_logo.svg" style="width:40px;height:unset;"/>
</a>, <a href="https://antismash.secondarymetabolites.org/">
          antiSMASH version 5.1.1
    </a>, <a href="#" id="download-dropdown-link"><img alt="download" src="images/download.svg"/>   Download</a>, <a href="M32.zip">Download all results</a>, <a href="M32.gbk">Download GenBank summary file</a>, <a href="https://antismash.secondarymetabolites.org/#!/about"><img alt="about" src="images/about.svg"/>   About</a>, <a href="https://docs.antismash.secondarymetabolites.org/"><img alt="help" src="images/help.svg"/>   Help</a>, <a href="https://antismash.secondarymetabolites.org/#!/contact"><img alt="contact" src="images/contact.svg"/>   Contact</a>, <a href="#">Overview</a>, <a href="#r2c1">2.1</a>, <a href="#r3c1">3.1</a>, <a href="#r5c1">5.1</a>, <a href="#r7c1">7.1</a>, <a href="#r13c1">13.1</a>, <a href="#r14c1">14.1</a>, <a href="#r15c1">15.1</a>, <a href="#r17c1">17.1</a>, <a href="#r19c1">19.1</a>, <a href="#r20c1">20.1</a>, <a href="#r25c1">25.1</a>, <a href="#r41c1">41.1</a>, <a href="#r42c1">42.1</a>, <a href="#r57c1">57.1</a>, <a href="#r61c1">61.1</a>, <a href="#r62c1">62.1</a>, <a href="#r78c1">78.1</a>, <a href="#r92c1">92.1</a>, <a href="#r100c1">100.1</a>, <a href="#r107c1">107.1</a>, <a href="#r112c1">112.1</a>, <a href="#r116c1">116.1</a>, <a href="#r148c1">148.1</a>, <a href="#r172c1">172.1</a>, <a href="#r240c1">240.1</a>, <a href="#r262c1">262.1</a>, <a href="#r292c1">2

有没有一种格式化它的方法,以便将每个新发现内容都放置在打印中的新行上?当出现这种混乱时,很难找到我要寻找的东西。

预期:

<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
python html beautifulsoup
1个回答
0
投票

尝试一下

search = soup.findAll('a') # it will return list 

for tag in search:
    print(tag)
© www.soinside.com 2019 - 2024. All rights reserved.