我正在尝试获取 2023 年国际篮联世界杯球队统计数据,其中我的国家(菲律宾)是主办国之一。
页面网址为:https://www.fiba.basketball/basketballworldcup/2023/teamstats
页面中选择的选项为:
"""
<select id="type_select" style="display: none;">
<option id="type_select_points_per_game" value="PPG">Points per Game</option>
<option id="type_select_points" value="PTS">Total Points</option>
<option id="type_select_field_goals" value="FG">Field Goal Shooting</option>
<option id="type_select_2_points" value="FG2">2 Point Field Goals</option>
<option id="type_select_3_points" value="FG3">3 Point Field Goals</option>
<option id="type_select_free_throws" value="FT">Free-Throws</option>
<option id="type_select_rebounds" value="REB">Rebounds</option>
<option id="type_select_blocks" value="BL">Blocks</option>
<option id="type_select_assists" value="ASS">Assists</option>
<option id="type_select_steals" value="ST">Steals</option>
<option id="type_select_turn_overs" value="TO">Turn Overs</option>
<option id="type_select_fouls" value="FO">Fouls</option>
<option id="type_select_minutes" value="MIN">Minutes</option>
<option id="type_select_efficiency" value="EFF">Efficiency</option>
<option id="type_select_double_doubles" value="DD">Double-Doubles</option>
</select>
"""
我使用剧作家来选择其中一个选项,但无法正确选择。
这是我选择带有
select
id 的 type_select
元素的代码。
try:
page.locator('select#type_select').select_option(value='FG', timeout=60000)
except Exception as exc:
print(f'Unexpected exception: {repr(exc)}')
Unexpected exception: TimeoutError('Timeout 60000ms exceeded.\n=========================== logs ===========================\nwaiting for locator("select#type_select")\n locator resolved to <select id="type_select">…</select>\n selecting specified option(s)\n element is not visible - waiting...\n============================================================')
代码只能从默认选项中提取表格数据。
"""Extract fiba world cup 2023 team stats.
url = 'https://www.fiba.basketball/basketballworldcup/2023/teamstats'
Install playwright:
pip install playwright
Install the required browsers:
playwright install
"""
from playwright.sync_api import sync_playwright
def scrape_team_stats(url):
with sync_playwright() as p:
browser = p.chromium.launch(headless=False) # True and fiba might block it
page = browser.new_page()
page.goto(url)
# Select option.
"""
<select id="type_select" style="display: none;">
<option id="type_select_points_per_game" value="PPG">Points per Game</option>
<option id="type_select_points" value="PTS">Total Points</option>
<option id="type_select_field_goals" value="FG">Field Goal Shooting</option>
<option id="type_select_2_points" value="FG2">2 Point Field Goals</option>
<option id="type_select_3_points" value="FG3">3 Point Field Goals</option>
<option id="type_select_free_throws" value="FT">Free-Throws</option>
<option id="type_select_rebounds" value="REB">Rebounds</option>
<option id="type_select_blocks" value="BL">Blocks</option>
<option id="type_select_assists" value="ASS">Assists</option>
<option id="type_select_steals" value="ST">Steals</option>
<option id="type_select_turn_overs" value="TO">Turn Overs</option>
<option id="type_select_fouls" value="FO">Fouls</option>
<option id="type_select_minutes" value="MIN">Minutes</option>
<option id="type_select_efficiency" value="EFF">Efficiency</option>
<option id="type_select_double_doubles" value="DD">Double-Doubles</option>
</select>
"""
try:
page.locator('select#type_select').select_option(value='FG', timeout=60000)
except Exception as exc:
print(f'Unexpected exception: {repr(exc)}')
page.wait_for_selector("#team_stat_table")
table = page.query_selector("table.comparative")
rows = table.query_selector_all("tr")
team_stats = []
for row in rows:
cells = row.query_selector_all("td")
if cells:
cell_values = []
for cell in cells:
cell_text = cell.text_content()
cell_values.append(cell_text)
team_stats.append(cell_values)
browser.close()
return team_stats
url = "https://www.fiba.basketball/basketballworldcup/2023/teamstats"
team_stats = scrape_team_stats(url)
print(team_stats)
# [['1.', 'Canada', '3', '200.0', '108.0', ...
我希望选择能够正常运行,以便我可以获得除默认选项之外的其他团队统计数据。
让我回答我自己的问题。感谢ggorlen的建议。
要选择的代码。
label = "Total Points" # "Points per Game", "Field Goal Shooting", ...
page.locator("span#type_selectSelectBoxIt").click()
page.locator("ul#type_selectSelectBoxItOptions").get_by_text(label).click()
[['1.', 'South Sudan', '4', '206.3', '355', '74-146', '50.7', '50-122', '41.0', '57-79', '72.2'], ['2.', 'New Zealand', '4', '206.3', ...
[['1.', 'Serbia', '3', '104.7', '37.0', '64.7', '111', '194', '57.2'], ['2.', 'Canada', '3', '108.0', '39.7', '72.3', '119', '217', '54.8'], ['3.', 'USA', '3', '106.0', '36.3',