尝试抓取网址

问题描述 投票:0回答:1

因此,我试图从免费游戏网站上获取所有网址,但该网址始终返回空白。我不知道我在做什么错,下图显示路径

result = requests.get("https://steamdb.info/upcoming/free/")
src = result.content
soup = BeautifulSoup(src, 'lxml')

urls = []
for td_tag in soup.find_all('td'):
    a_tag = td_tag.find('a')
    urls.append(a_tag.attrs['href'])

print(urls)

enter image description here

python beautifulsoup screen-scraping
1个回答
0
投票

您必须使用标题User-Agent,并且不能短Mozilla/5.0,但必须是真实网络浏览器中的完整字符串

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0",
}

result = requests.get("https://steamdb.info/upcoming/free/", headers=headers)
soup = BeautifulSoup(result.content, 'lxml')

#print(result.content)
urls = []
for td_tag in soup.find_all('td'):
    a_tag = td_tag.find('a')
    if a_tag:
        urls.append(a_tag.attrs['href'])

print(urls)
© www.soinside.com 2019 - 2024. All rights reserved.