使用 bs4 构建最终为空的对象列表

问题描述 投票:0回答:1

我正在使用

bs4
尝试打印出以下结构:

Talent Build Name Staggering Blow Build
Category Recommended 
Level 1 On The Prowl
Level 4 Hogger's Jogger's
Level 7 Seeing Red
Level 10 Shockwave
Level 13 Pummel
Level 16 Headbanger
Level 20 No Control
Talent Build Name Ez-Thro Dynamite Build
Category Situational
Level 1 ...

规格如下: VS代码 赢10 Python 3.12.1 BS4版本:4.12.3 请求版本:2.31.0

抄袭网站:https://www.icy-veins.com/heroes/hogger-talents

bs4资源:https://blog.logrocket.com/build-python-web-scraper-beautiful-soup/

Python如下:

from bs4 import BeautifulSoup
import requests

#heroname = input("Enter hero name:")


def fetch_talent_html():
    # make a request to the target website
    r = requests.get("https://www.icy-veins.com/heroes/hogger-talents")
    if r.status_code == 200:
        # if the request is successful return the HTML content
        return r.text
    else:
        # throw an exception if an error occurred
        raise Exception("an error occurred while fetching icyveins html")
    
def extract_talents_info(html):
    # Create a BeautifulSoup object to parse the HTML content
    soup = BeautifulSoup(html, 'html.parser')

   # talentdiv = soup.find(class_='heroes_builds')
   # print(talentdiv.prettify())
    heroes_builds_collection = soup.find(class_='heroes_builds')
    heroes_builds = heroes_builds_collection.find_all("heroes_build")[1:]
    print(heroes_builds)
    # iterate through our builds
    builds = []
    for builds in heroes_builds:  
        talent_collection = builds.find("div", {"class": "heroes_build_talents"})
        talents = talent_collection.find_all("heroes_build_talent_tier")[1:]
        talent = []
        for talent in talents:
            img_tag = talent.find('img')
            talent.append({
            "Level": talent.class_,
            "Ability": img_tag.get('alt'),
           # "name": talent.find("h3", {"class": "toc_no_parsing"})["data-sort"],
        })
        # extract the information needed using our observations
        builds.append({
            "Talent Build Name": talent.h3,
           # "name": talent.find("h3", {"class": "toc_no_parsing"})["data-sort"],
            "Category": talent.span.text.strip(),
            "Category2": talent.span.text.strip(),
            "Talents": talent
           # "change_24h": talent.find("td", {"class": "td-change24h"}).text.strip(),
        })
    return builds


# fetch talent's HTML content
html = fetch_talent_html()

# extract our data from the HTML document
builds = extract_talents_info(html)

# display the scraper results
for build in builds:
    print(build, "\n")

print(builds)

我在 VSCode 中运行代码,希望看到人才构建、级别和能力的列表。

当我修改 Heroes_builds 变量时,我收到一个错误,说没有 h3 标签,所以我有一种感觉,那里有爱。我只是还没到那儿。任何见解表示赞赏!

python-3.x web-scraping beautifulsoup python-requests
1个回答
0
投票

存在一些不同的问题,因此只需关注重要的事情 - 尝试迭代 HTML 树,就像您像人类一样阅读页面并选择所需的信息:

def extract_talents_info(html):
    soup = BeautifulSoup(html, 'html.parser')
    builds = []
    for b in soup.select('.heroes_build'):
        builds.append({
            'build_name': b.h3.get_text(),
            'category': b.span.text.strip(),
            'talents': [
                {
                    'level':t.span.get_text(),
                    'ability': t.img.get('alt')
                } 
                for t in b.select('.heroes_build_talent_tier')
            ] 
        })

    return builds

结果为:

[{'build_name': 'Staggering Blow Build',
  'category': 'Recommended',
  'talents': [{'level': 'Level 1', 'ability': 'On The Prowl Icon'},
   {'level': 'Level 4', 'ability': "Hogger's Joggers Icon"},
   {'level': 'Level 7', 'ability': 'Seeing Red Icon'},
   {'level': 'Level 10', 'ability': 'Shockwave Icon'},
   {'level': 'Level 13', 'ability': 'Pummel Icon'},
   {'level': 'Level 16', 'ability': 'Headbanger Icon'},
   {'level': 'Level 20', 'ability': 'No Control Icon'}]},
 {'build_name': 'Ez-Thro Dynamite Build',
  'category': 'Situational',
  'talents': [{'level': 'Level 1', 'ability': 'Journeyman Cooking Icon'},
   {'level': 'Level 4', 'ability': "Hogger's Joggers Icon"},
   {'level': 'Level 7', 'ability': 'Dense Blasting Powder Icon'},
   {'level': 'Level 10', 'ability': 'Shockwave Icon'},
   {'level': 'Level 13', 'ability': 'Pummel Icon'},
   {'level': 'Level 16', 'ability': 'Kablooie! Icon'},
   {'level': 'Level 20', 'ability': 'No Control Icon'}]},
 {'build_name': 'ARAM Build',
  'category': 'ARAM',
  'talents': [{'level': 'Level 1', 'ability': 'Journeyman Cooking Icon'},
   {'level': 'Level 4', 'ability': "Hogger's Joggers Icon"},
   {'level': 'Level 7', 'ability': 'Dense Blasting Powder Icon'},
   {'level': 'Level 10', 'ability': 'Shockwave Icon'},
   {'level': 'Level 13', 'ability': 'Pummel Icon'},
   {'level': 'Level 16', 'ability': 'Kablooie! Icon'},
   {'level': 'Level 20', 'ability': 'No Control Icon'}]}]
© www.soinside.com 2019 - 2024. All rights reserved.