我正在使用Python 3和Selenium来从网站上获取一些图像链接,如下所示:
import sys
import os
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.proxy import Proxy, ProxyType
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
link_xpath = '/html/body/main/div/div[2]/div[2]/div/div/div[2]/div/div[2]/div[1]/div/div/div[2]/div/img'
link_path = driver.find_element_by_xpath(link_xpath).text
print(link_path)
driver.quit()
解析该URL时,您可以在页面中间看到有问题的图像。在Google Chrome浏览器中右键单击并检查元素时,可以在Chrome开发工具中右键单击元素本身,并获取该图像的xpath。
为了我,一切看起来都很正常,但是在运行上述代码时,出现以下错误:
Traceback (most recent call last):
File "G:\folder\folder\testfilepy", line 16, in <module>
link_path = driver.find_element_by_xpath(link_xpath).text
File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath)
File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
'value': value})['value']
File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "G:\Python36\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/main/div/div[2]/div[2]/div/div/div[2]/div/div[2]/div[1]/div/div/div[2]/div/img"}
(Session info: headless chrome=83.0.4103.61)
谁能告诉我为什么Selenium无法找到提供的xpath?
src
属性,您需要为WebDriverWait引入visibility_of_element_located()
,并且可以使用以下Locator Strategies中的任何一个:CSS_SELECTOR
:options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--headless')
options.add_argument('--window-size=1920,1080')
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.o-layout__item div.c-bezel.programme-content__image>img"))).get_attribute("src"))
XPATH
:options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--headless')
options.add_argument('--window-size=1920,1080')
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='o-layout__item']//div[@class='c-bezel programme-content__image']/img"))).get_attribute("src"))
https://images.metadata.sky.com/pd-image/251eeec2-acb3-4733-891b-60f10f2cc28c/16-9/640
注:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
参考您可以在NoSuchElementException中找到一些详细的讨论:
xpath
,但不要使用绝对路径,因此非常容易更改。试试这个相对的xpath
://div[@class="c-bezel programme-content__image"]//img
。为了达到您的意思,请使用.get_attribute("src")
而不是.text
driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '//div[@class="c-bezel programme-content__image"]//img')))
print(element.get_attribute("src"))
driver.quit()
或更好的方法,使用css选择器。这应该更快:
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.c-bezel.programme-content__image > img')))
chrome_options.add_argument('window-size=1920x1080')
希望获得帮助。