我在 <a tag in python

Question

我正在尝试通过足球网站页面获取页面链接。我已经拉出了

<a tags

，但为了得到

'href',

，它出来时是空的。

自从经历了叠加以来，我尝试了不同的方法。

with open("premier_league/page_2.html", encoding= 'utf-8') as f:
    page = f.read()
parsed_page = BeautifulSoup(html, "html.parser")
links = parsed_page.find_all("a")

Answer 1

您非常接近，但您只需要缩小搜索范围。现在，您正在抓取每个链接的全部内容，最终得到几层 XML（我相信，自从我接触 BeautifulSoup 以来已经有 10 年了。

您只需进一步指定，您正在专门查找所有

href

元素内部的

属性。我发现另一个与你的问题非常相似的问题已经得到了更彻底的回答，但本质上是它的要点：

您可以通过以下方式使用 find_all 来查找每个具有 href 属性的 a 元素，并打印每个元素：

# Python2
from BeautifulSoup import BeautifulSoup
    
html = '''<a href="some_url">next</a>
<span class="class"><a href="another_url">later</a></span>'''
    
soup = BeautifulSoup(html)
    
for a in soup.find_all('a', href=True):
    print "Found the URL:", a['href']

# The output would be:
# Found the URL: some_url
# Found the URL: another_url

# Python3
from bs4 import BeautifulSoup

html = '''<a href="https://some_url.com">next</a>
<span class="class">
<a href="https://some_other_url.com">another_url</a></span>'''

soup = BeautifulSoup(html)

for a in soup.find_all('a', href=True):
    print("Found the URL:", a['href'])

# The output would be:
# Found the URL: https://some_url.com
# Found the URL: https://some_other_url.com

我不确定你使用的是哪个版本的Python或BeautifulSoup，但你应该注意这一点，因为他们在新版本中改变了一些小东西，如果你没有意识到的话，这些小东西可以帮助你。

我在 <a tag in python

问题描述投票：0回答：1

1个回答

最新问题

我在 <a tag in python

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1