我正在尝试从 FBRef 网站上抓取足球运动员的数据,我从该网站获取了作为
bs4.element.ResultSet
对象的数据。
代码:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
res = requests.get("https://fbref.com/en/comps/9/stats/Premier-League-Stats")
comp = re.compile("<!--|-->")
soup = BeautifulSoup(comp.sub("",res.text),'lxml')
all_data = soup.findAll("tbody")
player_data = all_data[2]
数据如下:
<tr><th class="right" **...** href="/en/players/774cf58b/Max-Aarons">Max Aarons</a></td><td **...** data-stat="position">DF</td><td class="left" data-stat="team"><a href="/en/squads/4ba7cbea/Bournemouth-Stats">Bournemouth</a></td><td class="center" data-stat="age">24-084</td><td class="center" data-stat="birth_year">2000</td><td**...** </a></td></tr>
<tr><th class="right" **...** href="/en/players/77816c91/Benie-Adama-Traore">Bénie Adama Traore</a></td><td **...** data-stat="position">FW,MF</td><td class="left" data-stat="team"><a href="/en/squads/1df6b87e/Sheffield-United-Stats">Sheffield Utd</a></td><td class="center" data-stat="age">21-119</td><td class="center" data-stat="birth_year">2002 **...** </a></td></tr>
**...**
我想从中创建一个 Pandas 数据框,例如:
**Name Position Team Age Birth Year** **...**
Max Aarons DF Bournemouth 24 2000
Benie Adama Traore FW Sheffield Utd 21 2002
**...**
在这里查看类似的问题并干燥以应用解决方案,但无法使其发挥作用
要从抓取的数据创建 Pandas DataFrame,您可以迭代标签,从每个标签中提取相关信息,然后将其附加到列表中。最后,您可以使用该列表来创建 DataFrame。具体方法如下:
import requests
from bs4 import BeautifulSoup
import pandas as pd
res = requests.get("https://fbref.com/en/comps/9/stats/Premier-League-Stats")
soup = BeautifulSoup(res.text, 'lxml')
player_data = soup.find_all("tbody")[2]
data = []
for row in player_data.find_all("tr"):
name = row.find("a").text
position = row.find("td", {"data-stat": "position"}).text
team = row.find("td", {"data-stat": "team"}).text
age = row.find("td", {"data-stat": "age"}).text
birth_year = row.find("td", {"data-stat": "birth_year"}).text
data.append([name, position, team, age, birth_year])
df = pd.DataFrame(data, columns=['Name', 'Position', 'Team', 'Age', 'Birth Year'])
print(df)
此代码将从抓取的数据中创建一个包含“名称”、“职位”、“团队”、“年龄”和“出生年份”列的 DataFrame。
pd.read_html
直接将HTML代码读取到dataframe:
import re
from io import StringIO
import pandas as pd
import requests
res = requests.get("https://fbref.com/en/comps/9/stats/Premier-League-Stats")
comp = re.compile("<!--|-->")
df = pd.read_html(StringIO(comp.sub("", res.text)))[2] # <-- locate the right table
print(df)
打印:
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Unnamed: 6_level_0 Playing Time Performance Expected Progression Per 90 Minutes Unnamed: 36_level_0
Rk Player Nation Pos Squad Age Born MP Starts Min 90s Gls Ast G+A G-PK PK PKatt CrdY CrdR xG npxG xAG npxG+xAG PrgC PrgP PrgR Gls Ast G+A G-PK G+A-PK xG xAG xG+xAG npxG npxG+xAG Matches
0 1 Max Aarons eng ENG DF Bournemouth 24-085 2000 14 12 1085 12.1 0 1 1 0 0 0 1 0 0.0 0.0 0.8 0.8 19 40 22 0.00 0.08 0.08 0.00 0.08 0.00 0.07 0.07 0.00 0.07 Matches
1 2 Bénie Adama Traore ci CIV FW,MF Sheffield Utd 21-120 2002 8 3 387 4.3 0 0 0 0 0 0 0 0 0.3 0.3 0.5 0.8 7 9 14 0.00 0.00 0.00 0.00 0.00 0.06 0.13 0.19 0.06 0.19 Matches
2 3 Tyler Adams us USA MF Bournemouth 25-044 1999 1 0 20 0.2 0 0 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0 1 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Matches
3 4 Tosin Adarabioyo eng ENG DF Fulham 26-187 1997 15 13 1173 13.0 1 0 1 1 0 0 1 0 0.6 0.6 0.1 0.6 5 39 3 0.08 0.00 0.08 0.08 0.08 0.04 0.01 0.05 0.04 0.05 Matches
4 5 Elijah Adebayo eng ENG FW Luton Town 26-082 1998 23 13 1162 12.9 9 0 9 9 0 0 1 0 5.6 5.6 0.7 6.3 14 19 85 0.70 0.00 0.70 0.70 0.70 0.43 0.05 0.49 0.43 0.49 Matches
5 6 Simon Adingra ci CIV FW Brighton 22-088 2002 21 16 1446 16.1 6 1 7 6 0 0 2 0 3.1 3.1 2.3 5.4 72 32 199 0.37 0.06 0.44 0.37 0.44 0.19 0.14 0.34 0.19 0.34 Matches
...