我正在尝试从网站提取特定数据并将其加载到我的 Excel 工作表中。
例如,我想从https://www.metacritic.com/game/sid-meiers-civilization-vi/中提取Metascore。此信息位于特定元素内。
Private Sub URL_Load(ByVal sURL As String)
'Variablen deklarieren
Dim appInternetExplorer As Object
Dim htmlTxt As String
Dim spanTxt2 As String
Set appInternetExplorer = CreateObject("InternetExplorer.Application")
appInternetExplorer.navigate sURL
Do: Loop Until appInternetExplorer.Busy = False
Do: Loop Until appInternetExplorer.Busy = False
spanTxt = appInternetExplorer.document.DocumentElement.all.tags("SPAN")
'objSelect = appInternetExplorer.document.DocumentElement.all.tags("SPAN")
Debug.Print htmlTxt
Set appInternetExplorer = Nothing
Close
'Mache hier irgendwas mit dem Text: Parsen, ausgeben, speichern
MsgBox "Der Text wurde ausgelesen!"
End Sub
在这段代码中,变量“spanTxt”用以下值描述:X。不幸的是,这不是我想要提取的元素。
如何提取特定元素?
我尝试过:
htmlTxt = appInternetExplorer.document.DocumentElement.outerHTML
htmlTxt1(1) = appInternetExplorer.document.DocumentElement.innerHTML
htmlTxt2 = appInternetExplorer.document.DocumentElement.innerText
由于您正在查询静态页面,因此您实际上不必熟悉任何网络技术或使用任何外部对象:-
Public Sub ImportWebPage()
Dim ws As Worksheet: Set ws = Sheet1 'change as required'
With ws.QueryTables.Add("URL;https://www.metacritic.com/game/sid-meiers-civilization-vi/", ws.Range("A1"))
.WebSelectionType = xlEntirePage
.BackgroundQuery = False
.Refresh
End With
Dim metascore As Double
With ws
metascore = .Columns(1).Find("meta").Offset(1).Value
.Columns(1).Clear
.QueryTables(1).Delete
End With
MsgBox "Metascore is " & metascore
End Sub
请注意,此方法导入完整的 html(因此您需要一个“备用”范围来存储它),然后使用常规功能来定位元分数。
不要再使用IE了。它已经过时了。以下是如何使用 xhr(XML HTTP 请求)获取所需值的示例
Sub ExampleToGetMetaScore()
Const url As String = "https://www.metacritic.com/game/sid-meiers-civilization-vi/"
Dim doc As Object
Dim nodeScore As Object
Set doc = CreateObject("htmlFile")
With CreateObject("MSXML2.XMLHTTP.6.0")
.Open "GET", url, False
.send
If .Status = 200 Then
doc.body.innerHTML = .responseText
Set nodeScore = doc.getElementsByClassName("c-productScoreInfo_scoreNumber")(0)
MsgBox nodeScore.innerText
Else
MsgBox "Page not loaded. HTTP status " & .Status
End If
End With
End Sub
您写道您是 VBA 初学者。但 VBA 不足以用于 WebScraping。您还需要熟悉以下技术: