我想使用 XPath 表达式从网站获取元素。我使用内置的 Microsoft 库 MSXML2 来执行此操作,但没有
getElementByXpath()
方法。我发现这个非常好的函数 getXPathElement()
如下所示,可以在 这个旧线程 中通过 Xpath 获取元素,并且它仅适用于 full xpath
表达式,但我需要通过 Xpath 查找包含一些文本的元素。
例如,如果我想从 url
https://www.w3schools.com/html/获取包含文本
HTML Editors
的元素,则 full Xpath = "/html/body/div[4]/div/div/a[3]"
但基于文本的 Xpath 的一个选项可以是Xpath = "//a[text()[contains(.,'HTML Editors')]]"
使用第二个 XPath,该函数会失败。有没有办法找到这种Xpath表达式?
顺便说一句:我知道 Selenium 有 FindByXpath 选项,但就我所见,这意味着以一种棘手的方式安装 Selenium 驱动程序,因为没有 VBA 的直接绑定,如果可能的话,我想避免安装其他东西出于办公室安全原因。
这是我当前的代码:
Sub Main()
Dim url As String
Dim oHttp As New MSXML2.XMLHTTP60
Dim elem As HTMLBaseElement
url = "https://www.w3schools.com/html/"
oHttp.Open "GET", url, False
oHttp.send
Dim html As New HTMLDocument
html.body.innerHTML = oHttp.responseText
Set elem = getXPathElement("/html/body/div[4]/div/div/a[3]", html)
' ### with this xpath doesn´t work
'Set elem = getXPathElement("//a[text()[contains(.,'HTML Editors')]]", html)
Debug.Print elem.innerText
End Sub
Public Function getXPathElement(sXPath As String, objElement As Object) As HTMLBaseElement
Dim sXPathArray() As String
Dim sNodeName As String
Dim sNodeNameIndex As String
Dim sRestOfXPath As String
Dim lNodeIndex As Long
Dim lCount As Long
' Split the xpath statement
sXPathArray = Split(sXPath, "/")
sNodeNameIndex = sXPathArray(1)
If Not InStr(sNodeNameIndex, "[") > 0 Then
sNodeName = sNodeNameIndex
lNodeIndex = 1
Else
sXPathArray = Split(sNodeNameIndex, "[")
sNodeName = sXPathArray(0)
lNodeIndex = CLng(Left(sXPathArray(1), Len(sXPathArray(1)) - 1))
End If
sRestOfXPath = Right(sXPath, Len(sXPath) - (Len(sNodeNameIndex) + 1))
Set getXPathElement = Nothing
For lCount = 0 To objElement.ChildNodes().Length - 1
If UCase(objElement.ChildNodes().Item(lCount).nodeName) = UCase(sNodeName) Then
If lNodeIndex = 1 Then
If sRestOfXPath = "" Then
Set getXPathElement = objElement.ChildNodes().Item(lCount)
Else
Set getXPathElement = getXPathElement(sRestOfXPath, objElement.ChildNodes().Item(lCount))
End If
End If
lNodeIndex = lNodeIndex - 1
End If
Next lCount
End Function
更新
我基于@SiebeJongebloed 的
链接共享,使用
SelectNodes()
更改了主宏的代码,并收到如下所示的运行时错误。我做错了什么?
Sub Main()
Dim url As String
Dim oHttp As New MSXML2.XMLHTTP60
Dim elem As MSXML2.IXMLDOMNode
url = "https://www.w3schools.com/html/"
oHttp.Open "GET", url, False
oHttp.send
Dim html As New HTMLDocument
html.body.innerHTML = oHttp.responseText
Set doc = New MSXML2.DOMDocument60
doc.SetProperty "SelectionLanguage", "XPath"
doc.Load oHttp.responseText
Set elem = doc.SelectNodes("//a[text()[contains(.,'HTML Editors')]")
End Sub
改变
//a[text()[contains(.,'HTML Editors')]
至
//a[text()[contains(.,'HTML Editors')]]