我会尽量做到简洁。我正在努力从 fbref 中抓取一些足球数据,但在将尚未进行的比赛加载到我的 ML 数据集中时遇到了问题。我希望获得的示例在“分数和赛程”表下。
https://fbref.com/en/squads/18bb7c10/Arsenal-Stats
当我运行代码时,我能够抓取今年迄今为止进行的所有比赛,但不能抓取我试图预测的即将到来的比赛。
{import requests
from bs4 import BeautifulSoup
import time
for year in years:
data = requests.get(standings_url)
soup = BeautifulSoup(data.text)
standings_table = soup.select('table.stats_table')[0]
links = [l.get("href") for l in standings_table.find_all('a')]
links = [l for l in links if '/squads/' in l]
team_urls = [f"https://fbref.com{l}" for l in links]
previous_season = soup.select("a.prev")[0].get("href")
standings_url = f"https://fbref.com{previous_season}"
for team_url in team_urls:
team_name = team_url.split("/")[-1].replace("-Stats", "").replace("-", " ")
data = requests.get(team_url)
matches = pd.read_html(data.text, match="Scores & Fixtures")[0]
soup = BeautifulSoup(data.text)
links = [l.get("href") for l in soup.find_all('a')]
links = [l for l in links if l and 'all_comps/shooting/' in l]
data = requests.get(f"https://fbref.com{links[0]}")
shooting = pd.read_html(data.text, match="Shooting")[0]
shooting.columns = shooting.columns.droplevel()
try:
team_data = matches.merge(shooting[["Date", "Sh", "SoT", "Dist", "FK", "PK", "PKatt"]], on="Date")
except ValueError:
continue
team_data = team_data[team_data["Comp"] == "Premier League"]
team_data["Season"] = year
team_data["Team"] = team_name
all_matches.append(team_data)
time.sleep(10)
我尝试调整代码以包含更多日期范围,并尝试删除分隔它们的标题。