使用Rvest抓取网页时如何识别html_node()

问题描述 投票:0回答:1

我想使用 R 从下一页中抓取两个表

https://www.footywire.com/afl/footy/ft_match_statistics?mid=10751

我已经使用 rvest 包尝试了以下操作,但它不起作用

温度<- "https://www.footywire.com/afl/footy/ft_match_statistics?mid=10751"

temp2 <- temp %>% read_html %>% html_node('比赛统计-team1-row') %>% html_table()

有人知道 html_node() 部分应该包含什么吗?

非常感谢你

r web-scraping rvest
1个回答
0
投票

由于

match-statistics-team1-row
是要选择的元素的
id
,因此必须在其前面添加
#

library(rvest)
url <- "https://www.footywire.com/afl/footy/ft_match_statistics?mid=10751"

url %>%
  read_html() %>%
  html_element("#match-statistics-team1-row") %>%
  html_table()
#> # A tibble: 29 × 435
#>    X1    X2    X3    X4    X5    X6    X7    X8    X9    X10   X11   X12   X13  
#>    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#>  1 "Ric… "Ric… "Ric… Rich… Coac… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#>  2 "Ric… "Coa…  <NA> <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#>  3 ""    "Pla… "Pla… K     HB    D     M     G     B     T     HO    GA    I50  
#>  4 "Pla… "K"   "HB"  D     M     G     B     T     HO    GA    I50   CL    CG   
#>  5 "Tim… "18"  "14"  32    3     0     2     8     0     0     4     5     8    
#>  6 "Dan… "17"  "10"  27    7     0     0     4     0     1     5     0     2    
#>  7 "Dus… "15"  "8"   23    6     1     0     0     0     1     8     1     5    
#>  8 "Dio… "10"  "13"  23    4     0     1     1     0     0     2     5     3    
#>  9 "Jac… "11"  "9"   20    1     0     2     9     0     0     5     2     6    
#> 10 "Tre… "9"   "9"   18    3     0     0     3     0     0     3     4     5    
#> # ℹ 19 more rows
#> # ℹ 422 more variables: X14 <chr>, X15 <chr>, X16 <chr>, X17 <chr>, X18 <chr>,
#> #   X19 <chr>, X20 <chr>, X21 <chr>, X22 <int>, X23 <int>, X24 <int>,
#> #   X25 <int>, X26 <int>, X27 <int>, X28 <int>, X29 <int>, X30 <int>,
#> #   X31 <int>, X32 <int>, X33 <int>, X34 <int>, X35 <int>, X36 <int>,
#> #   X37 <int>, X38 <int>, X39 <chr>, X40 <int>, X41 <int>, X42 <int>,
#> #   X43 <int>, X44 <int>, X45 <int>, X46 <int>, X47 <int>, X48 <int>, …
© www.soinside.com 2019 - 2024. All rights reserved.