我正在尝试从该网站抓取赛程列表
https://www.nrl.com/draw/?competition=111&round=1&season=2024
输出应该是
海鹰、兔子
公鸡、野马
骑士、突袭者等
我写了以下代码
url <- "https://www.nrl.com/draw/?competition=111&round=1&season=2024"
page <- read_html(url)
contentnodes <- page %>% html_nodes ("div.u-spacing-mt-24.pre-quench") %>%
html_attr("q-data") %>% jsonlite::fromJSON()
但我收到以下错误:
lexical error: invalid char in json text NA
在线阅读一些建议数据是 HTML 而不是 JSON,但我在同一网站上使用类似的代码抓取了不同的页面,所以不完全确定这里出了什么问题?
library(tidyverse)
library(httr2)
"https://www.nrl.com/draw//data?competition=111&season=2024" %>%
request() %>%
req_perform() %>%
resp_body_json(simplifyVector = T) %>%
pluck("fixtures") %>%
unnest(c(homeTeam, awayTeam), names_sep = "_") %>%
select(contains("nickName"),
contains("odds"))
# A tibble: 8 × 4
homeTeam_nickName awayTeam_nickName homeTeam_odds awayTeam_odds
<chr> <chr> <chr> <chr>
1 Sea Eagles Rabbitohs 2.17 1.69
2 Roosters Broncos 2.51 1.53
3 Knights Raiders 1.42 2.87
4 Warriors Sharks 1.60 2.34
5 Storm Panthers 2.24 1.65
6 Eels Bulldogs 1.47 2.70
7 Titans Dragons 1.49 2.64
8 Dolphins Cowboys 2.67 1.48