依赖的下拉选项,而Web Scraping不加载。

问题描述 投票:0回答:1

我正试图从以下网站刮取数据。http:/www.equibase.comstatsView.cfm?tf=meet&tb=jockey&rbt=TB。

我想让VBA代码执行以下步骤。

  1. 转到URL
  2. 点击 "骑师"
  3. 从下拉列表中选择一个曲目。例如,选择 "ALBUQUERQUE"
  4. 根据选择的赛道,页面会加载 "Available Meets "下拉菜单。现在我想从这个下拉菜单中选择第一个比赛项目

我的代码的问题是,即使它从第一个下拉列表中选择了 "ALBUQUERQUE "的值,但没有加载第二个下拉列表中的数据。

Sub extract()

Dim ie As New InternetExplorer
Dim doc As New HTMLDocument
Dim optionText As String

optionText = "ALBUQUERQUE"
ie.Visible = True
Url = "http://www.equibase.com/stats/View.cfm?tf=meet&tb=jockey&rbt=TB"
ie.Navigate Url

Application.StatusBar = "Navigating to URL..."

Do
DoEvents
Loop Until ie.ReadyState = READYSTATE_COMPLETE

Do While ie.Busy
    DoEvents
Loop

Set doc = ie.Document

Set jockeyButton = doc.getElementsByClassName("scMainTab")

For Each Button In jockeyButton
    If Button.getAttribute("href") = "#jockey" Then
        Button.Click
        Exit For
    End If
Next Button

Set tracksDropdown = doc.getElementById("selAvailTracks")

''AT THIS POINT, IT SHOULD AUTOMATICALLY LOAD THE SECOND DROP DOWN BUT IT IS NOT HAPPENING


ie.Quit
Set ie = Nothing

End Sub

请帮助我在这里告诉我如何能从第二个下拉列表中选择第一个项目

excel vba web-scraping screen-scraping
1个回答
0
投票

这个神奇的词是 "html事件"。为了使下拉菜单中的选择有效,它的变化事件必须被触发。否则什么都不会发生。

你不能把 "ALBUQUERQUE "放在第一个下拉菜单中。ALBUQUERQUE "的值是 "ALB:USA"

<select id="selAvailTracks" name="selAvailTracks" class="scTrackSelects">
  <option value=""> Available Tracks </option>
  <option value="ALB:USA">ALBUQUERQUE</option>
  <option value="AQU:USA">AQUEDUCT</option>
  <option value="ARP:USA">ARAPAHOE PARK</option>
  <option value="AZD:USA">ARIZONA DOWNS</option>
  <option value="AP :USA">ARLINGTON</option>
  <option value="ASD:CAN">ASSINIBOIA DOWNS</option>
  <option value="ATO:USA">ATOKAD DOWNS</option>
  <option value="BEL:USA">BELMONT PARK</option>
  ...
  ...
  ...

另一种选择方式是所需元素的索引。这用于第2个下拉菜单。

试试这个宏来进行选择,包括下拉菜单2。

Sub Extract()

'Declare all variables
Dim url As String
Dim browser As Object
Dim htmlDoc As Object
Dim nodeTracksDropdown As Object
Dim dateDropdown As Object
Dim trackInDropdown As String

  'Initialize variables
  trackInDropdown = "ALB:USA" 'You can also get this from a cell of a table
  url = "http://www.equibase.com/stats/View.cfm?tf=meet&tb=jockey&rbt=TB"

  'Initialize Internet Explorer, set visibility,
  'call URL and wait until page is fully loaded
  Set browser = CreateObject("internetexplorer.application")
  browser.Visible = True
  browser.navigate url
  Do Until browser.ReadyState = 4: DoEvents: Loop
  'Short break to load dynamic content
  Application.Wait (Now + TimeSerial(0, 0, 3))

  'Shortening document reference
  Set htmlDoc = browser.document

  'Get first dropdown, select track, trigger change event
  'and wait a second to set up the second dropdown
  Set nodeTracksDropdown = htmlDoc.getElementById("selAvailTracks")
  nodeTracksDropdown.Value = trackInDropdown
  Call TriggerEvent(htmlDoc, nodeTracksDropdown, "change")
  Application.Wait (Now + TimeSerial(0, 0, 1))

  'Get second dropdown, select second entry, trigger change event
  'and wait a second to set up the following elements
  Set dateDropdown = htmlDoc.getElementById("selAvailRaceMeets")
  dateDropdown.selectedIndex = 1
  Call TriggerEvent(htmlDoc, dateDropdown, "change")
  Application.Wait (Now + TimeSerial(0, 0, 1))

  'Do whatever you want here
  '...
  '...
  '...

  'Clean up
  'browser.Quit
  'Set browser = Nothing
  'Set nodeTracksDropdown = Nothing
  'Set dateDropdown = Nothing
End Sub

这个过程是用来处理html事件的

Private Sub TriggerEvent(htmlDocument As Object, htmlElementWithEvent As Object, eventType As String)

  Dim theEvent As Object

  htmlElementWithEvent.Focus
  Set theEvent = htmlDocument.createEvent("HTMLEvents")
  theEvent.initEvent eventType, True, False
  htmlElementWithEvent.dispatchEvent theEvent
End Sub
© www.soinside.com 2019 - 2024. All rights reserved.