我正在使用
bs4
尝试打印出以下结构:
Talent Build Name Staggering Blow Build
Category Recommended
Level 1 On The Prowl
Level 4 Hogger's Jogger's
Level 7 Seeing Red
Level 10 Shockwave
Level 13 Pummel
Level 16 Headbanger
Level 20 No Control
Talent Build Name Ez-Thro Dynamite Build
Category Situational
Level 1 ...
规格如下: VS代码 赢10 Python 3.12.1 BS4版本:4.12.3 请求版本:2.31.0
抄袭网站:https://www.icy-veins.com/heroes/hogger-talents
bs4资源:https://blog.logrocket.com/build-python-web-scraper-beautiful-soup/
Python如下:
from bs4 import BeautifulSoup
import requests
#heroname = input("Enter hero name:")
def fetch_talent_html():
# make a request to the target website
r = requests.get("https://www.icy-veins.com/heroes/hogger-talents")
if r.status_code == 200:
# if the request is successful return the HTML content
return r.text
else:
# throw an exception if an error occurred
raise Exception("an error occurred while fetching icyveins html")
def extract_talents_info(html):
# Create a BeautifulSoup object to parse the HTML content
soup = BeautifulSoup(html, 'html.parser')
# talentdiv = soup.find(class_='heroes_builds')
# print(talentdiv.prettify())
heroes_builds_collection = soup.find(class_='heroes_builds')
heroes_builds = heroes_builds_collection.find_all("heroes_build")[1:]
print(heroes_builds)
# iterate through our builds
builds = []
for builds in heroes_builds:
talent_collection = builds.find("div", {"class": "heroes_build_talents"})
talents = talent_collection.find_all("heroes_build_talent_tier")[1:]
talent = []
for talent in talents:
img_tag = talent.find('img')
talent.append({
"Level": talent.class_,
"Ability": img_tag.get('alt'),
# "name": talent.find("h3", {"class": "toc_no_parsing"})["data-sort"],
})
# extract the information needed using our observations
builds.append({
"Talent Build Name": talent.h3,
# "name": talent.find("h3", {"class": "toc_no_parsing"})["data-sort"],
"Category": talent.span.text.strip(),
"Category2": talent.span.text.strip(),
"Talents": talent
# "change_24h": talent.find("td", {"class": "td-change24h"}).text.strip(),
})
return builds
# fetch talent's HTML content
html = fetch_talent_html()
# extract our data from the HTML document
builds = extract_talents_info(html)
# display the scraper results
for build in builds:
print(build, "\n")
print(builds)
我在 VSCode 中运行代码,希望看到人才构建、级别和能力的列表。
当我修改 Heroes_builds 变量时,我收到一个错误,说没有 h3 标签,所以我有一种感觉,那里有爱。我只是还没到那儿。任何见解表示赞赏!
存在一些不同的问题,因此只需关注重要的事情 - 尝试迭代 HTML 树,就像您像人类一样阅读页面并选择所需的信息:
def extract_talents_info(html):
soup = BeautifulSoup(html, 'html.parser')
builds = []
for b in soup.select('.heroes_build'):
builds.append({
'build_name': b.h3.get_text(),
'category': b.span.text.strip(),
'talents': [
{
'level':t.span.get_text(),
'ability': t.img.get('alt')
}
for t in b.select('.heroes_build_talent_tier')
]
})
return builds
结果为:
[{'build_name': 'Staggering Blow Build',
'category': 'Recommended',
'talents': [{'level': 'Level 1', 'ability': 'On The Prowl Icon'},
{'level': 'Level 4', 'ability': "Hogger's Joggers Icon"},
{'level': 'Level 7', 'ability': 'Seeing Red Icon'},
{'level': 'Level 10', 'ability': 'Shockwave Icon'},
{'level': 'Level 13', 'ability': 'Pummel Icon'},
{'level': 'Level 16', 'ability': 'Headbanger Icon'},
{'level': 'Level 20', 'ability': 'No Control Icon'}]},
{'build_name': 'Ez-Thro Dynamite Build',
'category': 'Situational',
'talents': [{'level': 'Level 1', 'ability': 'Journeyman Cooking Icon'},
{'level': 'Level 4', 'ability': "Hogger's Joggers Icon"},
{'level': 'Level 7', 'ability': 'Dense Blasting Powder Icon'},
{'level': 'Level 10', 'ability': 'Shockwave Icon'},
{'level': 'Level 13', 'ability': 'Pummel Icon'},
{'level': 'Level 16', 'ability': 'Kablooie! Icon'},
{'level': 'Level 20', 'ability': 'No Control Icon'}]},
{'build_name': 'ARAM Build',
'category': 'ARAM',
'talents': [{'level': 'Level 1', 'ability': 'Journeyman Cooking Icon'},
{'level': 'Level 4', 'ability': "Hogger's Joggers Icon"},
{'level': 'Level 7', 'ability': 'Dense Blasting Powder Icon'},
{'level': 'Level 10', 'ability': 'Shockwave Icon'},
{'level': 'Level 13', 'ability': 'Pummel Icon'},
{'level': 'Level 16', 'ability': 'Kablooie! Icon'},
{'level': 'Level 20', 'ability': 'No Control Icon'}]}]