如何使用“for”循环在网络抓取任务中使用 rvest 和 R 打印表格?

问题描述 投票:0回答:2

此 HTML 页面在根 HTML 节点下至少包含三个子节点。我怎样才能在第二行代码中使用 for 循环来打印每个表?

root_node <- read_html("https://en.wikipedia.org/wiki/List_of_bicycle-sharing_systems")

table_nodes <- html_nodes(root_node, "table")

我对第一个元素table_nodes[[1]]中的共享单车表很感兴趣。

r for-loop web-scraping rvest
2个回答
1
投票

这里有一个简单的方法。提取第一个

"table.wikitable"
节点,然后从该节点提取表。

library(rvest)

link <- "https://en.wikipedia.org/wiki/List_of_bicycle-sharing_systems"
root_node <- read_html(link)

root_node |>
  html_element("table.wikitable") |>
  html_table(header = TRUE)
#> # A tibble: 549 × 10
#>    Country   City   Name  System Opera…¹ Launc…² Disco…³ Stati…⁴ Bicyc…⁵ Daily…⁶
#>    <chr>     <chr>  <chr> <chr>  <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
#>  1 Albania   Tiran… Ecov… ""     ""      March … ""      8       200     ""     
#>  2 Argentina Bueno… Ecob… "Sert… "Bike … 2010    ""      400     4000    "21917"
#>  3 Argentina Mendo… Metr… ""     ""      2014    ""      2       40      ""     
#>  4 Argentina Rosar… Mi B… ""     ""      2 Dece… ""      47      480     ""     
#>  5 Argentina San L… Bici… "Bici… ""      27 Nov… ""      8       80      ""     
#>  6 Australia Melbo… Melb… "PBSC… "Motiv… June 2… "30 No… 53      676     ""     
#>  7 Australia Melbo… oBike "4 Ge… ""      July 2… "July … dockle… 1250    ""     
#>  8 Australia Brisb… City… "3 Ge… "JCDec… Septem… ""      150     2000    ""     
#>  9 Australia Sydney oBike "4 Ge… ""      July 2… "July … dockle… 1250    ""     
#> 10 Australia Sydney Ofo   "4 Ge… ""      Octobe… ""      dockle… 600     ""     
#> # … with 539 more rows, and abbreviated variable names ¹​Operator, ²​Launched,
#> #   ³​Discontinued, ⁴​Stations, ⁵​Bicycles, ⁶​`Daily ridership`

创建于 2023-04-10 与 reprex v2.0.2


0
投票

我们可以这样做:

library(rvest)

root_node <- read_html("https://en.wikipedia.org/wiki/List_of_bicycle-sharing_systems")

table_nodes <- html_nodes(root_node, "table")

for (i in 1) {
  table_html <- table_nodes[[i]]
  table_df <- html_table(table_html)
  print(table_df)
}
A tibble: 549 × 10
   Country   City                  Name                 System              Operator                       Launched         Discontinued    Stati…¹ Bicyc…² Daily…³
   <chr>     <chr>                 <chr>                <chr>               <chr>                          <chr>            <chr>           <chr>   <chr>   <chr>  
 1 Albania   Tirana[5]             Ecovolis             ""                  ""                             March 2011       ""              8       200     ""     
 2 Argentina Buenos Aires[6][7]    Ecobici              "Serttel Brasil[8]" "Bike In Baires Consortium[9]" 2010             ""              400     4000    "21917"
 3 Argentina Mendoza[10]           Metrobici            ""                  ""                             2014             ""              2       40      ""     
 4 Argentina Rosario               Mi Bici Tu Bici[11]  ""                  ""                             2 December 2015  ""              47      480     ""     
 5 Argentina San Lorenzo, Santa Fe Biciudad             "Biciudad"          ""                             27 November 2016 ""              8       80      ""     
 6 Australia Melbourne[12]         Melbourne Bike Share "PBSC & 8D"         "Motivate"                     June 2010        "30 November 2… 53      676     ""     
 7 Australia Melbourne[12]         oBike                "4 Gen. oBike"      ""                             July 2017        "July 2018"     dockle… 1250    ""     
 8 Australia Brisbane[14][15]      CityCycle            "3 Gen. Cyclocity"  "JCDecaux"                     September 2010   ""              150     2000    ""     
 9 Australia Sydney                oBike                "4 Gen. oBike"      ""                             July 2017        "July 2018"     dockle… 1250    ""     
10 Australia Sydney                Ofo                  "4 Gen. Ofo"        ""                             October 2017     ""              dockle… 600     ""     
# … with 539 more rows, and abbreviated variable names ¹​Stations, ²​Bicycles, ³​`Daily ridership`
# ℹ Use `print(n = ...)` to see more rows
> 
© www.soinside.com 2019 - 2024. All rights reserved.