在网站上https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i9-11900K+%40+3.50GHz&id=3904 我尝试在“定价历史记录”部分中抓取所有工具提示信息、CPU 的价格和日期
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
webdriver_service = Service()
driver = webdriver.Chrome(options=options, service=webdriver_service)
driver.get('https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i9-11900K+%40+3.50GHz&id=3904')
element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//*[@id='placeholder']/div/canvas[2]")))
for el in element:
ActionChains(driver).move_to_element(el).perform()
mouseover = WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.SELECTOR, ".placeholder > div > div.canvasjs-chart-tooltip > div > span")))
print(mouseover.text)
但结果显示:“WebElement”对象不可迭代。 有什么我必须修改的吗?或者还有其他好方法来抓取“定价历史记录”部分中所有价格和日期的鼠标悬停信息吗?谢谢您的帮助!!!
要将图表中的时间/价格放入 pandas 数据框中,您可以使用下一个示例:
import re
import pandas as pd
import requests
url = (
"https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i9-11900K+%40+3.50GHz&id=3904"
)
html_text = requests.get(url).text
df = pd.DataFrame(
re.findall(r"dataArray\.push\({x: (\d+), y: ([\d.]+)}", html_text),
columns=["time", "price"],
)
df["time"] = pd.to_datetime(df["time"].astype(int) // 1000, unit="s")
print(df.tail())
打印:
time price
236 2023-05-28 06:00:00 317.86
237 2023-05-29 06:00:00 319.43
238 2023-05-30 06:00:00 429.99
239 2023-05-31 06:00:00 314.64
240 2023-06-01 06:00:00 318.9