为什么BeautifulSoup无法从HTML中找到特定的表格元素?

问题描述 投票:0回答:1

我无法获取代码来查找比赛表格表中找到的文本(在下面的元素中突出显示)。实际获取该文本的适当元素是什么?

import requests
from bs4 import BeautifulSoup

# URL of the webpage to scrape
url = "https://www.racingandsports.com.au/thoroughbred/horse/smokin-rubi/1978955"

# Define the user agent header
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}

# Send a GET request to the URL with the user agent header
response = requests.get(url, headers=headers)

# Parse the HTML content of the page
soup = BeautifulSoup(response.content, 'html.parser')

# Find the table element with class 'table table-condensed table-striped table-hover tbl-race-form'
table_element = soup.find('table', class_='table table-condensed table-striped table-hover tbl-race-form')

# Check if the table element is found
if table_element:
    # Extract text from the table
    table_text = table_element.get_text(separator='\n', strip=True)
    print(table_text)
else:
    print("No table with the specified class found on the page.")
python web-scraping beautifulsoup python-requests
1个回答
0
投票

预期内容是动态加载/渲染的,不是您通过

requests
获得的静态响应的一部分。

但是,要获得包含表的结果,请更改您的网址:

url = "https://www.racingandsports.com.au/Horse/GetRaceFormPartialTable?horseIdStr=1978955&dic=thoroughbred"

在这种情况下如何知道内容是否动态加载/渲染?

第一个指标,在浏览器中以人类身份调用网站,并注意到该区域出现加载动画/延迟。第二个指标,该内容不包含在对请求的静态响应中。现在,您可以使用浏览器的开发人员工具查看“XHR 请求”选项卡,以了解正在从哪些资源加载哪些数据。 -> http://developer.chrome.com/docs/devtools/network

如果有 api 使用它,否则使用

selenium

© www.soinside.com 2019 - 2024. All rights reserved.