我想从本网站的每个任务中提取以下信息:
使用
rvest
我尝试在查看记录下提取每个任务的href,但运气不佳
results <- read_html("https://aad.archives.gov/aad/display-partial-records.jsp?dt=1802&sc=23947%2C23905%2C23906%2C23880%2C23907%2C23889%2C23890%2C23892%2C23893%2C23894&cat=all&tf=F&bc=%2Csl%2Cfd&q=&as_alq=&as_anq=&as_epq=&as_woq=&nfo_23947=V%2C1%2C1900&cl_23947=&nfo_23905=V%2C25%2C1900&op_23905=0&txt_23905=&nfo_23906=V%2C2%2C1900&cl_23906=03&nfo_23880=D%2C6%2C1966&op_23880=3&txt_23880=&txt_23880=&nfo_23907=D%2C6%2C1966&op_23907=3&txt_23907=&txt_23907=&nfo_23889=V%2C10%2C1900&op_23889=0&txt_23889=&nfo_23890=V%2C10%2C1900&op_23890=0&txt_23890=&nfo_23892=V%2C1%2C1900&cl_23892=E%2CX%2CA%2C7%2C%3D%2CQ%2CR%2CI%2C3%2CV&nfo_23893=V%2C2%2C1900&cl_23893=J0&nfo_23894=N%2C5%2C1900&op_23894=6&txt_23894=0&txt_23894=")
missions_url <- results %>%
html_nodes("tbody td:nth-child(1)") %>%
html_text()
请告诉我如何提取上述信息。谢谢你。
我已经能够使用以下代码做到这一点:
library(rvest)
library(RDCOMClient)
url <- "https://aad.archives.gov/aad/display-partial-records.jsp?dt=1802&sc=23947%2C23905%2C23906%2C23880%2C23907%2C23889%2C23890%2C23892%2C23893%2C23894&cat=all&tf=F&bc=%2Csl%2Cfd&q=&as_alq=&as_anq=&as_epq=&as_woq=&nfo_23947=V%2C1%2C1900&cl_23947=&nfo_23905=V%2C25%2C1900&op_23905=0&txt_23905=&nfo_23906=V%2C2%2C1900&cl_23906=03&nfo_23880=D%2C6%2C1966&op_23880=3&txt_23880=&txt_23880=&nfo_23907=D%2C6%2C1966&op_23907=3&txt_23907=&txt_23907=&nfo_23889=V%2C10%2C1900&op_23889=0&txt_23889=&nfo_23890=V%2C10%2C1900&op_23890=0&txt_23890=&nfo_23892=V%2C1%2C1900&cl_23892=E%2CX%2CA%2C7%2C%3D%2CQ%2CR%2CI%2C3%2CV&nfo_23893=V%2C2%2C1900&cl_23893=J0&nfo_23894=N%2C5%2C1900&op_23894=6&txt_23894=0&txt_23894=&rpp=50"
IEApp <- COMCreate("InternetExplorer.Application")
IEApp[['Visible']] <- TRUE
IEApp$Navigate(url)
Sys.sleep(10)
doc <- IEApp$document()
web_Obj_Table <- doc$getElementByID("queryResults")
html_Content <- read_html(doc$Body()$innerHtml())
list_Html_Table <- html_table(html_Content)
list_Html_Table[[2]]
# A tibble: 51 × 11
`View Record` `FORCE NATIONALITY` `OPERATION NAME` `MAJOR PROVINCE CODE` `INITIATION DATE` `TERMINATION DATE` `BRIGADE DESIGNATION` `DIVISION DESIGNATION` `LOSS NATIONALITY`
<lgl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 NA "" "" "" "" "" "" "" ""
2 NA "RVN" "NGU HOANH SON" "Quang Nam" "10/22/2065" "" "" "" "RVN"
3 NA "Marine" "SUWANNEE" "Quang Nam" "08/13/1966" "" "9 MAR" "3 MAR" "RVN"
4 NA "RVN" "HOA TUYEN 147" "Quang Nam" "08/19/1966" "08/27/1966" "" "" "RVN"
5 NA "RVN" "HOA TUYEN 149" "Quang Nam" "09/01/1966" "" "51 INF" "2 INF" "RVN"
6 NA "RVN" "HOA TUYEN 149" "Quang Nam" "09/01/1966" "09/04/1966" "INF" "2 INF" "RVN"
7 NA "RVN" "HOA TUYEN 153" "Quang Nam" "09/19/1966" "09/24/1966" "" "" "RVN"
8 NA "RVN" "TAO THANH DUY" "Quang Nam" "09/24/1966" "09/26/1966" "" "" "RVN"
9 NA "RVN" "HOA TUYEN 154" "Quang Nam" "09/30/1966" "" "" "" "RVN"
10 NA "RVN" "TRUY KICH TRD 51" "Quang Nam" "10/16/1966" "" "51 INF" "2 INF" "RVN"
# ℹ 41 more rows
# ℹ 2 more variables: `LOSS CODE` <chr>, `NUMBER DESTROYED OF KILLED` <int>
# ℹ Use `print(n = ...)` to see more rows