如何使用 VBA 和 MSXML2 通过 Xpath 进行搜索?

问题描述 投票:0回答:1

我想使用 XPath 表达式从网站获取元素。我使用内置的 Microsoft 库 MSXML2 来执行此操作,但没有

getElementByXpath()
方法。我发现这个非常好的函数
getXPathElement()
如下所示,可以在 这个旧线程 中通过 Xpath 获取元素,并且它仅适用于
full xpath
表达式,但我需要通过 Xpath 查找包含一些文本的元素。

例如,如果我想从 url

https://www.w3schools.com/html/
获取包含文本 HTML Editors 的元素,则
full Xpath = "/html/body/div[4]/div/div/a[3]"
但基于文本的 Xpath 的一个选项可以是
Xpath = "//a[text()[contains(.,'HTML Editors')]]"

使用第二个 XPath,该函数会失败。有没有办法找到这种Xpath表达式?

顺便说一句:我知道 Selenium 有 FindByXpath 选项,但就我所见,这意味着以一种棘手的方式安装 Selenium 驱动程序,因为没有 VBA 的直接绑定,如果可能的话,我想避免安装其他东西出于办公室安全原因。

这是我当前的代码:

Sub Main()
Dim url As String
Dim oHttp As New MSXML2.XMLHTTP60
Dim elem As HTMLBaseElement


    url = "https://www.w3schools.com/html/"
    oHttp.Open "GET", url, False
    oHttp.send
    
    Dim html As New HTMLDocument
    html.body.innerHTML = oHttp.responseText
    Set elem = getXPathElement("/html/body/div[4]/div/div/a[3]", html)

    ' ### with this xpath doesn´t work
    'Set elem = getXPathElement("//a[text()[contains(.,'HTML Editors')]]", html) 

    Debug.Print elem.innerText
    
End Sub

Public Function getXPathElement(sXPath As String, objElement As Object) As HTMLBaseElement
     Dim sXPathArray() As String
 
    Dim sNodeName As String
     Dim sNodeNameIndex As String
     Dim sRestOfXPath As String
     Dim lNodeIndex As Long
     Dim lCount As Long
 
    ' Split the xpath statement
     sXPathArray = Split(sXPath, "/")
     sNodeNameIndex = sXPathArray(1)
     If Not InStr(sNodeNameIndex, "[") > 0 Then
         sNodeName = sNodeNameIndex
         lNodeIndex = 1
     Else
         sXPathArray = Split(sNodeNameIndex, "[")
         sNodeName = sXPathArray(0)
         lNodeIndex = CLng(Left(sXPathArray(1), Len(sXPathArray(1)) - 1))
     End If
     sRestOfXPath = Right(sXPath, Len(sXPath) - (Len(sNodeNameIndex) + 1))
 
    Set getXPathElement = Nothing
     For lCount = 0 To objElement.ChildNodes().Length - 1
         If UCase(objElement.ChildNodes().Item(lCount).nodeName) = UCase(sNodeName) Then
             If lNodeIndex = 1 Then
                 If sRestOfXPath = "" Then
                     Set getXPathElement = objElement.ChildNodes().Item(lCount)
                 Else
                     Set getXPathElement = getXPathElement(sRestOfXPath, objElement.ChildNodes().Item(lCount))
                 End If
             End If
             lNodeIndex = lNodeIndex - 1
         End If
     Next lCount
 End Function

更新

我基于@SiebeJongebloed 的

链接共享
,使用SelectNodes()更改了主宏的代码,并收到如下所示的运行时错误。我做错了什么?

Sub Main()
Dim url As String
Dim oHttp As New MSXML2.XMLHTTP60
Dim elem As MSXML2.IXMLDOMNode


    url = "https://www.w3schools.com/html/"
    oHttp.Open "GET", url, False
    oHttp.send
    
    Dim html As New HTMLDocument
    html.body.innerHTML = oHttp.responseText
    
    Set doc = New MSXML2.DOMDocument60
    doc.SetProperty "SelectionLanguage", "XPath"
    doc.Load oHttp.responseText
    
    Set elem = doc.SelectNodes("//a[text()[contains(.,'HTML Editors')]")
    
End Sub

enter image description here

excel vba xpath domdocument msxml2
1个回答
0
投票

改变

//a[text()[contains(.,'HTML Editors')]

//a[text()[contains(.,'HTML Editors')]]
© www.soinside.com 2019 - 2024. All rights reserved.