Rvest:网页抓取日本棒球网站

问题描述 投票:0回答:1

我正在尝试使用 R 中的 rvest 包从 npb.jp 网站上抓取两个表格。我尝试对这两个表格使用 CSS 选择器,但无济于事。问题可能出在网页的格式上吗?

代码:

html  <- read_html("https://npb.jp/bis/eng/2022/stats/std_c.html")
css <- "#stdivmaintbl > table > tbody > tr > td > div:nth-child(1)"
nodes <-  html_nodes(html, css)
table <-  html_table(nodes)[[1]]

df <- data.frame(table)

代码正在 html 中读取,但似乎找不到表格。

感谢任何帮助。

html css r web-scraping rvest
1个回答
1
投票

无论出于何种原因,当我尝试直接读取 url 时,我收到有关证书的错误,因此我将源 html 复制并粘贴到文件中,而不是使用 URL 读取它。我假设我从文件中读取的内容应该与您从互联网上读取的内容相同。这对我有用:

library(rvest)
library(magrittr)


# this is where I saved the page's html
# assuming you don't have the same certificate problem I had, 
# you could use this instead: url <- "https://npb.jp/bis/eng/2022/stats/std_c.html"
url <- "baseball.html"

table <- url %>% read_html() %>% html_nodes(".stdtblmain") %>% html_table()

table[[1]]
> table[[1]]
# A tibble: 27 × 239
   X1        X2    X3    X4    X5    X6    X7    X8    X9    X10   X11   X12   X13   X14   X15   X16   X17   X18   X19   X20  
   <chr>     <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1 "TeamGWL… "Tea… G     W     L     T     PCT   "GB"  ""    Home  Road  ""    "vsS" vsDB  vsT   vsG   vsC   vsD   Int   Toky…
 2 "Team"    "G"   W     L     T     PCT   GB    ""    ""    Home  Road  ""    ""    vsS   vsDB  vsT   vsG   vsC   vsD   Int  
 3 "Tokyo Y… ""    Toky… 143   80    59    4     ""    ""    .576  --    ""    ""    37-34 43-2… ***   16-9  13-1… 11-1… 16-8…
 4 ""        "Tok… NA    NA    NA    NA    NA    ""    ""    NA    NA    ""    ""    NA    NA    NA    NA    NA    NA    NA   
 5 "YOKOHAM… ""    YOKO… 143   73    68    2     ""    ""    .518  8.0   ""    ""    41-3… 32-3… 9-16  ***   16-9  13-1… 8-17 
 6 ""        "YOK… NA    NA    NA    NA    NA    ""    ""    NA    NA    ""    ""    NA    NA    NA    NA    NA    NA    NA   
 7 "Hanshin… ""    Hans… 143   68    71    4     ""    ""    .489  12.0  ""    ""    37-3… 31-3… 11-1… 9-16  ***   14-1… 9-14…
 8 ""        "Han… NA    NA    NA    NA    NA    ""     NA   NA    NA     NA   ""    NA    NA    NA    NA    NA    NA    NA   
 9 "Yomiuri… ""    Yomi… 143   68    72    3     ".48… "12.… 35-3… 33-3… "13-… "11-… 10-1… ***   13-12 13-12 8-10  NA    NA   
10 ""        "Yom… NA    NA    NA    NA    NA     NA    NA   NA    NA     NA    NA   NA    NA    NA    NA    NA    NA    NA   
# … with 17 more rows, and 219 more variables: X21 <chr>, X22 <chr>, 
© www.soinside.com 2019 - 2024. All rights reserved.