使用selenium和python从Carmax上抓取网页,无法找到位于shadow-root内部的“加载更多匹配项”按钮

问题描述 投票:0回答:1

下面是我的代码,我不知道如何让 selenium 继续按下加载更多匹配按钮,直到它显示所有 66k 结果而不是一次 20 个左右。我尝试过复制多个内容的 xpath,因为我对 HTML 的理解有限,而且我是编码新手。

直接复制

"See More Matches"
的 xpath 时,出现不同的错误:
invalid selector
: The result of the xpath expression
"//*[@id="see-more"]/div/hzn-button/text()"
is:
[object Text
]。它应该是一个元素。

enter image description here

当我复制

<button class.....>
的 xpath 时,出现如下错误。我正在尝试自己进行网络抓取并进行第一次分析。感谢您提前的帮助。

#creating a bot with selenium
#creating a bot with selenium
from selenium import webdriver
import urllib3
import re
import time
import pandas as pd
website = 'https://www.carmax.com/cars/all'
service = webdriver.ChromeService(executable_path = '/Users/apple/Downloads/chromedriver-mac-arm64/chromedriver')
driver = webdriver.Chrome(service=service)
driver.get(website)
driver.find_element("xpath",'//*[@id="see-more"]/div/hzn-button//button')
---------------------------------------------------------------------------
NoSuchElementException                    Traceback (most recent call last)
Cell In[9], line 1
----> 1 driver.find_element("xpath",'//*[@id="see-more"]/div/hzn-button//button')

File ~/anaconda3/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py:741, in WebDriver.find_element(self, by, value)
    738     by = By.CSS_SELECTOR
    739     value = f'[name="{value}"]'
--> 741 return self.execute(Command.FIND_ELEMENT, {"using": by, "value": value})["value"]

File ~/anaconda3/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py:347, in WebDriver.execute(self, driver_command, params)
    345 response = self.command_executor.execute(driver_command, params)
    346 if response:
--> 347     self.error_handler.check_response(response)
    348     response["value"] = self._unwrap_value(response.get("value", None))
    349     return response

File ~/anaconda3/lib/python3.11/site-packages/selenium/webdriver/remote/errorhandler.py:229, in ErrorHandler.check_response(self, response)
    227         alert_text = value["alert"].get("text")
    228     raise exception_class(message, screen, stacktrace, alert_text)  # type: ignore[call-arg]  # mypy is not smart enough here
--> 229 raise exception_class(message, screen, stacktrace)

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="see-more"]/div/hzn-button//button"}
  (Session info: chrome=120.0.6099.109); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
0   chromedriver                        0x0000000102dca004 chromedriver + 4169732
1   chromedriver                        0x0000000102dc1ff8 chromedriver + 4136952
2   chromedriver                        0x0000000102a17500 chromedriver + 292096
3   chromedriver                        0x0000000102a5c7a0 chromedriver + 575392
4   chromedriver                        0x0000000102a97818 chromedriver + 817176
5   chromedriver                        0x0000000102a505e8 chromedriver + 525800
6   chromedriver                        0x0000000102a514b8 chromedriver + 529592
7   chromedriver                        0x0000000102d90334 chromedriver + 3932980
8   chromedriver                        0x0000000102d94970 chromedriver + 3950960
9   chromedriver                        0x0000000102d78774 chromedriver + 3835764
10  chromedriver                        0x0000000102d95478 chromedriver + 3953784
11  chromedriver                        0x0000000102d6aab4 chromedriver + 3779252
12  chromedriver                        0x0000000102db1914 chromedriver + 4069652
13  chromedriver                        0x0000000102db1a90 chromedriver + 4070032
14  chromedriver                        0x0000000102dc1c70 chromedriver + 4136048
15  libsystem_pthread.dylib             0x000000018414ffa8 _pthread_start + 148
16  libsystem_pthread.dylib             0x000000018414ada0 thread_start + 8
python html selenium-webdriver web-scraping data-science
1个回答
0
投票

您的按钮位于shadow-root内部,要获取内部shadow root结构,您应该先获取它的主机,然后获取shadowRoot属性。

当前示例中的影子主机是带有定位器的元素

//*[@id="see-more"]//hzn-button

get_shadow_root
函数使用JS执行器从主机获取影子根。

有关 Shadow DOM 的更多信息

注意:在您的情况下,您只需单击带有定位器

//*[@id="see-more"]//hzn-button
的元素,而不是进入
shadow-root

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
timeout = 10
wait = WebDriverWait(driver, timeout)

def get_shadow_root(element):
    return driver.execute_script('return arguments[0].shadowRoot', element)

driver.get("https://www.carmax.com/cars/all")
driver.maximize_window()

shadow_host = wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="see-more"]//hzn-button')))
see_more = get_shadow_root(shadow_host).find_element(By.CSS_SELECTOR, 'button.hzn-button')
see_more.click()
© www.soinside.com 2019 - 2024. All rights reserved.