抓取新闻时无法打印链接

Question

我想要页面上相关新闻文章的标题和链接列表。我可以只打印标题，但由于某种原因，链接仍然不可见。

这是我正在使用的Python代码。我期望得到一大块 html，后面是标题和链接列表。我得到的只是斑点和标题，不确定问题是什么，如果有人知道发生了什么，我喜欢对这个主题的一些见解

import requests
from bs4 import BeautifulSoup

url = 'https://ca.finance.yahoo.com/industries/energy/'

response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

articles = soup.find_all('h3', class_ = "Mb(5px)") #all of the relevant articles loaded on the page  


print(articles)                                    # prints the html including the title and link 

for items in articles:  
    
    title = items.text                            #titles print no problem
    link = items.absolute_links                   #no links are being printed 
    print(title , link)

Answer 1

我不知道您在哪里找到此代码，但这不起作用，因为您获得的元素没有“absolute_links”属性。相反，您可以做的是从元素中获取“

”标签，并从其“

href

”中获取链接（如果有的话）。对于外部网站，您可以在此处获得绝对链接，对于网站内部链接，您只能获得相对于“https://ca.finance.yahoo.com/”库的链接。最好的办法是检查每个链接是否是绝对链接，如果不是，只需在前面添加碱基即可。例如，您可以检查它是否以 https:// 或类似的内容开头。还有一些模块可能涵盖您的所有情况，但这是项目的品味/依赖性要求的问题。

import requests
from bs4 import BeautifulSoup

url = 'https://ca.finance.yahoo.com/industries/energy/'

response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

articles = soup.find_all('h3', class_ = "Mb(5px)") 

for items in articles:
    try:
        title = items.text                            
        link = items.a['href']                    
        print(title , link)
        # add some logic to turn relative to absolute links
    except KeyError:
        # no href in a
        pass
    except TypeError:
        # no a in item
        pass

抓取新闻时无法打印链接

问题描述投票：0回答：1

1个回答

最新问题

抓取新闻时无法打印链接

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1