无法通过网页抓取在维基百科中找到特定表格？

Question

我正在抓取以下维基百科页面：https://en.wikipedia.org/wiki/Eurovision_Song_Contest_2022。我已经能够从此页面中抓取另一个表格，但现在我想抓取“2022 年欧洲歌唱大赛第一场半决赛的参赛者和结果”表格。不过我好像没找到桌子？这是我的代码：

import requests
from bs4 import BeautifulSoup
url = 'https://en.wikipedia.org/wiki/Eurovision_Song_Contest_2022'
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

table2_2022 = soup.find_all('table', class_="wikitable sortable plainrowheaders")
semifinal1_2022 = table_2022[2]

html代码中的完整类名后面有jquery-tablesorter，但基于其他类似的问题，我放弃了它。

我使用find_all是因为维基百科页面中有多个表。问题是我似乎无法得到我想要的桌子？如果我选择桌子

table_2022[2]

，我会得到我想要的桌子之前的桌子（有不同的底池），当我选择

table_2022[3]

时，我会得到“半决赛1的详细陪审团投票结果”，即我想要的第一个表具有“wikitable plainheaders”作为类。彩池表和“半决赛 1 详细陪审团投票结果”之间有几张表，但我似乎无法访问其中任何一个。

我还尝试使用标题搜索来查找表格，如下所示：

tables = soup.find_all('table')

desired_caption = "Participants and results of the first semi-final of the Eurovision Song Contest 2022"

desired_table = None
for table in tables:
    caption = table.find('caption')
    if caption and caption.text.strip() == desired_caption:
        desired_table = table
        break

# If the desired table is found, scrape it
if desired_table:
    # Do your scraping logic here
    pass
else:
    print("Desired table not found.")

但在这里我得到的结果是“未找到所需的表。”

Answer 1

您正在查询带有文本输入的标题：

“2022年欧洲歌唱大赛第一场半决赛参赛者及成绩”

但实际的表格有以下标题文本：

“2022年欧洲歌唱大赛第一场半决赛参赛者及成绩[149]”

您可以寻找不同的完全匹配项：

desired_caption = "Participants and results of the first semi-final of the Eurovision Song Contest 2022[149]"

或保留旧的并检查它是否包含在实际的中：

desired_caption = "Participants and results of the first semi-final of the Eurovision Song Contest 2022"
# ...
if caption and desired_caption in caption.text.strip():
    desired_table = table
    break

无法通过网页抓取在维基百科中找到特定表格？

问题描述投票：0回答：1

1个回答

最新问题

无法通过网页抓取在维基百科中找到特定表格？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1