我是BeautifulSoup的新手,我想尝试一些网络抓取。对于我的小项目,我想从Wikipedia获得金州勇士队的胜率。我正计划将具有该信息的桌子摆成大熊猫,以便多年来可以对其进行绘图。但是,我的代码选择的是Table Key表而不是Seasons表。我知道这是因为它们是同一类型的表(wikitable),但我不知道如何解决此问题。我确信有一个简单的解释我很想念。有人可以解释一下如何修复我的代码,并解释一下将来如何选择要上网抓取的表格吗?谢谢!
c_data = "https://en.wikipedia.org/wiki/List_of_Golden_State_Warriors_seasons" #wikipedia page
c_page = urllib.request.urlopen(c_data)
c_soup = BeautifulSoup(c_page, "lxml")
c_table=c_soup.find('table', class_='wikitable') #this is the problem
c_year = []
c_rate = []
for row in c_table.findAll('tr'): #setup for dataframe
cells=row.findAll('td')
if len(cells)==13:
c_year = c_year.append(cells[0].find(text=True))
c_rate = c_rate.append(cells[9].find(text=True))
print(c_year, c_rate)
pd.read_html
获取所有表import pandas as pd
# read tables
tables = pd.read_html('https://en.wikipedia.org/wiki/List_of_Golden_State_Warriors_seasons')
print(len(tables))
>>> 18
tables[0]
0 1
0 AHC NBA All-Star Game Head Coach
1 AMVP All-Star Game Most Valuable Player
2 COY Coach of the Year
3 DPOY Defensive Player of the Year
4 Finish Final position in division standings
5 GB Games behind first-place team in division[b]
6 Italics Season in progress
7 Losses Number of regular season losses
8 EOY Executive of the Year
9 FMVP Finals Most Valuable Player
10 MVP Most Valuable Player
11 ROY Rookie of the Year
12 SIX Sixth Man of the Year
13 SPOR Sportsmanship Award
14 Wins Number of regular season wins
# display all dataframes in tables
for i, table in enumerate(tables):
print(f'Table {i}')
display(table)
print('\n')
df_i_want = tables[x] # x is the specified table, 0 indexed
# delete tables
del(tables)