如何使用 playwright 从 fiba 页面选择选项

问题描述 投票:0回答:1

我正在尝试获取 2023 年国际篮联世界杯球队统计数据,其中我的国家(菲律宾)是主办国之一。

页面网址为:https://www.fiba.basketball/basketballworldcup/2023/teamstats

页面中选择的选项为:

"""
<select id="type_select" style="display: none;">
    <option id="type_select_points_per_game" value="PPG">Points per Game</option>
    <option id="type_select_points" value="PTS">Total Points</option>
    <option id="type_select_field_goals" value="FG">Field Goal Shooting</option>
    <option id="type_select_2_points" value="FG2">2 Point Field Goals</option>
    <option id="type_select_3_points" value="FG3">3 Point Field Goals</option>
    <option id="type_select_free_throws" value="FT">Free-Throws</option>
    <option id="type_select_rebounds" value="REB">Rebounds</option>
    <option id="type_select_blocks" value="BL">Blocks</option>
    <option id="type_select_assists" value="ASS">Assists</option>
    <option id="type_select_steals" value="ST">Steals</option>
    <option id="type_select_turn_overs" value="TO">Turn Overs</option>
    <option id="type_select_fouls" value="FO">Fouls</option>
    <option id="type_select_minutes" value="MIN">Minutes</option>
    <option id="type_select_efficiency" value="EFF">Efficiency</option>            
    <option id="type_select_double_doubles" value="DD">Double-Doubles</option>
</select>
"""

我使用剧作家来选择其中一个选项,但无法正确选择。

这是我选择带有

select
id 的
type_select
元素的代码。

选择代码

try:
    page.locator('select#type_select').select_option(value='FG', timeout=60000)
except Exception as exc:
    print(f'Unexpected exception: {repr(exc)}')

错误信息

Unexpected exception: TimeoutError('Timeout 60000ms exceeded.\n=========================== logs ===========================\nwaiting for locator("select#type_select")\n  locator resolved to <select id="type_select">…</select>\n  selecting specified option(s)\n    element is not visible - waiting...\n============================================================')

完整代码

代码只能从默认选项中提取表格数据。

"""Extract fiba world cup 2023 team stats.

url = 'https://www.fiba.basketball/basketballworldcup/2023/teamstats'

Install playwright:
    pip install playwright

Install the required browsers:
    playwright install
"""


from playwright.sync_api import sync_playwright


def scrape_team_stats(url):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=False)  # True and fiba might block it
        page = browser.new_page()
        page.goto(url)

        # Select option.

        """
        <select id="type_select" style="display: none;">
            <option id="type_select_points_per_game" value="PPG">Points per Game</option>
            <option id="type_select_points" value="PTS">Total Points</option>
            <option id="type_select_field_goals" value="FG">Field Goal Shooting</option>
            <option id="type_select_2_points" value="FG2">2 Point Field Goals</option>
            <option id="type_select_3_points" value="FG3">3 Point Field Goals</option>
            <option id="type_select_free_throws" value="FT">Free-Throws</option>
            <option id="type_select_rebounds" value="REB">Rebounds</option>
            <option id="type_select_blocks" value="BL">Blocks</option>
            <option id="type_select_assists" value="ASS">Assists</option>
            <option id="type_select_steals" value="ST">Steals</option>
            <option id="type_select_turn_overs" value="TO">Turn Overs</option>
            <option id="type_select_fouls" value="FO">Fouls</option>
            <option id="type_select_minutes" value="MIN">Minutes</option>
            <option id="type_select_efficiency" value="EFF">Efficiency</option>            
            <option id="type_select_double_doubles" value="DD">Double-Doubles</option>
        </select>
        """

        try:
            page.locator('select#type_select').select_option(value='FG', timeout=60000)
        except Exception as exc:
            print(f'Unexpected exception: {repr(exc)}')

        page.wait_for_selector("#team_stat_table")
        table = page.query_selector("table.comparative")
        rows = table.query_selector_all("tr")

        team_stats = []

        for row in rows:
            cells = row.query_selector_all("td")
            if cells:
                cell_values = []
                for cell in cells:
                    cell_text = cell.text_content()
                    cell_values.append(cell_text)
                team_stats.append(cell_values)

        browser.close()
        return team_stats


url = "https://www.fiba.basketball/basketballworldcup/2023/teamstats"
team_stats = scrape_team_stats(url)

print(team_stats)
# [['1.', 'Canada', '3', '200.0', '108.0', ...

我希望选择能够正常运行,以便我可以获得除默认选项之外的其他团队统计数据。

python web-crawler playwright-python
1个回答
1
投票

让我回答我自己的问题。感谢ggorlen的建议。

要选择的代码。

label = "Total Points"  # "Points per Game", "Field Goal Shooting", ...

page.locator("span#type_selectSelectBoxIt").click()
page.locator("ul#type_selectSelectBoxItOptions").get_by_text(label).click()

输出总分

[['1.', 'South Sudan', '4', '206.3', '355', '74-146', '50.7', '50-122', '41.0', '57-79', '72.2'], ['2.', 'New Zealand', '4', '206.3', ...

射门得分输出

[['1.', 'Serbia', '3', '104.7', '37.0', '64.7', '111', '194', '57.2'], ['2.', 'Canada', '3', '108.0', '39.7', '72.3', '119', '217', '54.8'], ['3.', 'USA', '3', '106.0', '36.3',
© www.soinside.com 2019 - 2024. All rights reserved.