下面是我的代码,我不知道如何让 selenium 继续按下加载更多匹配按钮,直到它显示所有 66k 结果而不是一次 20 个左右。我尝试过复制多个内容的 xpath,因为我对 HTML 的理解有限,而且我是编码新手。
直接复制
"See More Matches"
的 xpath 时,出现不同的错误: invalid selector
: The result of the xpath expression "//*[@id="see-more"]/div/hzn-button/text()"
is: [object Text
]。它应该是一个元素。
当我复制
<button class.....>
的 xpath 时,出现如下错误。我正在尝试自己进行网络抓取并进行第一次分析。感谢您提前的帮助。
#creating a bot with selenium
#creating a bot with selenium
from selenium import webdriver
import urllib3
import re
import time
import pandas as pd
website = 'https://www.carmax.com/cars/all'
service = webdriver.ChromeService(executable_path = '/Users/apple/Downloads/chromedriver-mac-arm64/chromedriver')
driver = webdriver.Chrome(service=service)
driver.get(website)
driver.find_element("xpath",'//*[@id="see-more"]/div/hzn-button//button')
---------------------------------------------------------------------------
NoSuchElementException Traceback (most recent call last)
Cell In[9], line 1
----> 1 driver.find_element("xpath",'//*[@id="see-more"]/div/hzn-button//button')
File ~/anaconda3/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py:741, in WebDriver.find_element(self, by, value)
738 by = By.CSS_SELECTOR
739 value = f'[name="{value}"]'
--> 741 return self.execute(Command.FIND_ELEMENT, {"using": by, "value": value})["value"]
File ~/anaconda3/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py:347, in WebDriver.execute(self, driver_command, params)
345 response = self.command_executor.execute(driver_command, params)
346 if response:
--> 347 self.error_handler.check_response(response)
348 response["value"] = self._unwrap_value(response.get("value", None))
349 return response
File ~/anaconda3/lib/python3.11/site-packages/selenium/webdriver/remote/errorhandler.py:229, in ErrorHandler.check_response(self, response)
227 alert_text = value["alert"].get("text")
228 raise exception_class(message, screen, stacktrace, alert_text) # type: ignore[call-arg] # mypy is not smart enough here
--> 229 raise exception_class(message, screen, stacktrace)
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="see-more"]/div/hzn-button//button"}
(Session info: chrome=120.0.6099.109); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
0 chromedriver 0x0000000102dca004 chromedriver + 4169732
1 chromedriver 0x0000000102dc1ff8 chromedriver + 4136952
2 chromedriver 0x0000000102a17500 chromedriver + 292096
3 chromedriver 0x0000000102a5c7a0 chromedriver + 575392
4 chromedriver 0x0000000102a97818 chromedriver + 817176
5 chromedriver 0x0000000102a505e8 chromedriver + 525800
6 chromedriver 0x0000000102a514b8 chromedriver + 529592
7 chromedriver 0x0000000102d90334 chromedriver + 3932980
8 chromedriver 0x0000000102d94970 chromedriver + 3950960
9 chromedriver 0x0000000102d78774 chromedriver + 3835764
10 chromedriver 0x0000000102d95478 chromedriver + 3953784
11 chromedriver 0x0000000102d6aab4 chromedriver + 3779252
12 chromedriver 0x0000000102db1914 chromedriver + 4069652
13 chromedriver 0x0000000102db1a90 chromedriver + 4070032
14 chromedriver 0x0000000102dc1c70 chromedriver + 4136048
15 libsystem_pthread.dylib 0x000000018414ffa8 _pthread_start + 148
16 libsystem_pthread.dylib 0x000000018414ada0 thread_start + 8
您的按钮位于shadow-root内部,要获取内部shadow root结构,您应该先获取它的主机,然后获取shadowRoot属性。
当前示例中的影子主机是带有定位器的元素
//*[@id="see-more"]//hzn-button
get_shadow_root
函数使用JS执行器从主机获取影子根。
注意:在您的情况下,您只需单击带有定位器
//*[@id="see-more"]//hzn-button
的元素,而不是进入 shadow-root
。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
timeout = 10
wait = WebDriverWait(driver, timeout)
def get_shadow_root(element):
return driver.execute_script('return arguments[0].shadowRoot', element)
driver.get("https://www.carmax.com/cars/all")
driver.maximize_window()
shadow_host = wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="see-more"]//hzn-button')))
see_more = get_shadow_root(shadow_host).find_element(By.CSS_SELECTOR, 'button.hzn-button')
see_more.click()