我无法获取代码来查找比赛表格表中找到的文本(在下面的元素中突出显示)。实际获取该文本的适当元素是什么?
import requests
from bs4 import BeautifulSoup
# URL of the webpage to scrape
url = "https://www.racingandsports.com.au/thoroughbred/horse/smokin-rubi/1978955"
# Define the user agent header
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
# Send a GET request to the URL with the user agent header
response = requests.get(url, headers=headers)
# Parse the HTML content of the page
soup = BeautifulSoup(response.content, 'html.parser')
# Find the table element with class 'table table-condensed table-striped table-hover tbl-race-form'
table_element = soup.find('table', class_='table table-condensed table-striped table-hover tbl-race-form')
# Check if the table element is found
if table_element:
# Extract text from the table
table_text = table_element.get_text(separator='\n', strip=True)
print(table_text)
else:
print("No table with the specified class found on the page.")
预期内容是动态加载/渲染的,不是您通过
requests
获得的静态响应的一部分。
但是,要获得包含表的结果,请更改您的网址:
url = "https://www.racingandsports.com.au/Horse/GetRaceFormPartialTable?horseIdStr=1978955&dic=thoroughbred"
在这种情况下如何知道内容是否动态加载/渲染?
第一个指标,在浏览器中以人类身份调用网站,并注意到该区域出现加载动画/延迟。第二个指标,该内容不包含在对请求的静态响应中。现在,您可以使用浏览器的开发人员工具查看“XHR 请求”选项卡,以了解正在从哪些资源加载哪些数据。 -> http://developer.chrome.com/docs/devtools/network
如果有 api 使用它,否则使用
selenium
。