如何使用美丽的汤来选择特定的餐桌?

问题描述 投票:0回答:1

我已经搜索了一段时间,但找不到答案。

我正在尝试通过以下链接抓取维基百科上可用的“各州结果”表: https://en.wikipedia.org/wiki/2020_United_States_presidential_election#Results_by_state

到目前为止,当我运行代码时,我只能让它从页面上方的“乔·拜登 vs 唐纳德·特朗普”表中提取数据。


website = 'https://en.wikipedia.org/wiki/2020_United_States_presidential_election'

result = requests.get(website)
content = result.text

soup = BeautifulSoup(content, "html.parser")

tables = soup.find("table", class_="wikitable sortable")
for table in tables:
    if 'Results by state' in table.text:
        headers = [header.text.strip() for header in table.find_all('th')]
        rows = []
        table_rows = table.find_all('tr')    
        for row in table_rows:
           td = row.find_all('td')
           row = [row.text for row in td]
           rows.append(row)
python web-scraping beautifulsoup
1个回答
0
投票

不确定,因为你的代码似乎可以工作。然而,抓取表格的最简单方法是使用

pandas.read_html()
并尝试匹配表格中的模式:

import pandas as pd

pd.read_html('https://en.wikipedia.org/wiki/2020_United_States_presidential_election#Results_by_state', match='Results by state')[0]

直接使用

BeautifulSoup
尝试选择更具体的表格,例如与
css selectors
:

tables = soup.select('table:has(caption:-soup-contains("Results by state"))')
© www.soinside.com 2019 - 2024. All rights reserved.