如何从网站（HTML）获取特定数据？

Question

我正在尝试从网站提取特定数据并将其加载到我的 Excel 工作表中。

例如，我想从https://www.metacritic.com/game/sid-meiers-civilization-vi/中提取Metascore。此信息位于特定元素内。

我要提取的元素

Private Sub URL_Load(ByVal sURL As String)
    'Variablen deklarieren
    Dim appInternetExplorer As Object
    Dim htmlTxt As String
    Dim spanTxt2 As String
    
    Set appInternetExplorer = CreateObject("InternetExplorer.Application")
    appInternetExplorer.navigate sURL
    Do: Loop Until appInternetExplorer.Busy = False
    Do: Loop Until appInternetExplorer.Busy = False
    spanTxt = appInternetExplorer.document.DocumentElement.all.tags("SPAN")
    'objSelect = appInternetExplorer.document.DocumentElement.all.tags("SPAN")
    Debug.Print htmlTxt
    Set appInternetExplorer = Nothing
    Close
    'Mache hier irgendwas mit dem Text: Parsen, ausgeben, speichern
    MsgBox "Der Text wurde ausgelesen!"
End Sub

在这段代码中，变量“spanTxt”用以下值描述：X。不幸的是，这不是我想要提取的元素。

如何提取特定元素？

我尝试过：

htmlTxt = appInternetExplorer.document.DocumentElement.outerHTML
htmlTxt1(1) = appInternetExplorer.document.DocumentElement.innerHTML
htmlTxt2 = appInternetExplorer.document.DocumentElement.innerText

Answer 1

由于您正在查询静态页面，因此您实际上不必熟悉任何网络技术或使用任何外部对象：-

Public Sub ImportWebPage()
    Dim ws As Worksheet: Set ws = Sheet1    'change as required'
    With ws.QueryTables.Add("URL;https://www.metacritic.com/game/sid-meiers-civilization-vi/", ws.Range("A1"))
        .WebSelectionType = xlEntirePage
        .BackgroundQuery = False
        .Refresh
    End With
    Dim metascore As Double
    With ws
        metascore = .Columns(1).Find("meta").Offset(1).Value
        .Columns(1).Clear
        .QueryTables(1).Delete
    End With
    MsgBox "Metascore is " & metascore
End Sub

请注意，此方法导入完整的 html（因此您需要一个“备用”范围来存储它），然后使用常规功能来定位元分数。

Answer 2

不要再使用IE了。它已经过时了。以下是如何使用 xhr（XML HTTP 请求）获取所需值的示例

Sub ExampleToGetMetaScore()

    Const url As String = "https://www.metacritic.com/game/sid-meiers-civilization-vi/"
    Dim doc As Object
    Dim nodeScore As Object
    
    Set doc = CreateObject("htmlFile")
    
    With CreateObject("MSXML2.XMLHTTP.6.0")
        .Open "GET", url, False
        .send
        
        If .Status = 200 Then
            doc.body.innerHTML = .responseText
            Set nodeScore = doc.getElementsByClassName("c-productScoreInfo_scoreNumber")(0)
            MsgBox nodeScore.innerText
        Else
            MsgBox "Page not loaded. HTTP status " & .Status
        End If
    End With
End Sub

您写道您是 VBA 初学者。但 VBA 不足以用于 WebScraping。您还需要熟悉以下技术：

xhr（从网络服务器下载文件）
SeleniumBasic（如果 xhr 不起作用，则自动化 Chrome/Edge）
HTML（在结构中本地化页面的内容并找到解决他们的方法）
HTML 的 DOM（提供类似使用的方法 getElementsByClassName())
DOM = 文档对象模型

如何从网站（HTML）获取特定数据？

问题描述投票：0回答：2

2个回答

最新问题

如何从网站（HTML）获取特定数据？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2