为什么BeautifulSoup无法从HTML中找到特定的表格元素？

Question

我无法获取代码来查找比赛表格表中找到的文本（在下面的元素中突出显示）。实际获取该文本的适当元素是什么？

import requests
from bs4 import BeautifulSoup

# URL of the webpage to scrape
url = "https://www.racingandsports.com.au/thoroughbred/horse/smokin-rubi/1978955"

# Define the user agent header
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}

# Send a GET request to the URL with the user agent header
response = requests.get(url, headers=headers)

# Parse the HTML content of the page
soup = BeautifulSoup(response.content, 'html.parser')

# Find the table element with class 'table table-condensed table-striped table-hover tbl-race-form'
table_element = soup.find('table', class_='table table-condensed table-striped table-hover tbl-race-form')

# Check if the table element is found
if table_element:
    # Extract text from the table
    table_text = table_element.get_text(separator='\n', strip=True)
    print(table_text)
else:
    print("No table with the specified class found on the page.")

Answer 1

预期内容是动态加载/渲染的，不是您通过

requests

获得的静态响应的一部分。

但是，要获得包含表的结果，请更改您的网址：

url = "https://www.racingandsports.com.au/Horse/GetRaceFormPartialTable?horseIdStr=1978955&dic=thoroughbred"

在这种情况下如何知道内容是否动态加载/渲染？

第一个指标，在浏览器中以人类身份调用网站，并注意到该区域出现加载动画/延迟。第二个指标，该内容不包含在对请求的静态响应中。现在，您可以使用浏览器的开发人员工具查看“XHR 请求”选项卡，以了解正在从哪些资源加载哪些数据。 -> http://developer.chrome.com/docs/devtools/network

如果有 api 使用它，否则使用

selenium

。

为什么BeautifulSoup无法从HTML中找到特定的表格元素？

问题描述投票：0回答：1

1个回答

最新问题

为什么BeautifulSoup无法从HTML中找到特定的表格元素？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1