我正在尝试获取元素的绝对 XPath 但给出不同的输出。我正在尝试在谷歌中获取搜索按钮的完整XPath代码是:
import time
import random
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from lxml import etree
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("--log-level=3")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
s = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s, options=options)
main_link = r"https://www.google.com"
driver.get(main_link)
time.sleep(5)
with open ("dom.xml","w",encoding="utf-8") as domfile:
domfile.write(driver.page_source)
tree = etree.parse("dom.xml",parser=etree.XMLParser(recover=True))
print(tree)
element = tree.xpath("(//input[@class='gNO89b'])[2]")
print(element)
#trying to print absolute xpath . .
print (tree.getpath(element[0]))
输出应该是:
/html/body/div[1]/div[3]/form/div[1]/div[1]/div[4]/center/input[1]
但它给了我:
/html/head/meta/meta/meta/link/script[6]/br/body/div/div[2]/div[2]/form/div/div/div/div[2]/div[2]/div[7]/center/input
这是因为您正在使用
html
解析 xml
的输出。由于它们是 2 种不同的格式,因此转换时会有一些差异。保留 HTMl 的最佳方法是将其解析为字符串。
import time
import random
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
import lxml.html
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("--log-level=3")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
s = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s, options=options)
main_link = r"https://www.google.com"
driver.get(main_link)
time.sleep(5)
tree = lxml.html.fromstring(driver.page_source)
root = tree.getroottree()
element = tree.xpath("(//input[@class='gNO89b'])[2]")
print(root.getpath(element[0]))
输出:
/html/body/div[1]/div[3]/form/div[1]/div[1]/div[4]/center/input[1]
如果您的目标是在解析后将
HTML
文档序列化为 XML
文档,您可能必须考虑先应用一些手动预处理。