将具有多个表的站点中的表抓取到 r

问题描述 投票:0回答:1

我练习将网站上的表格转移到 R 中。感觉每个网站都需要自己独特的策略来做到这一点。我有一些,但我被这个难住了:https://www.cbssports.com/fantasy/football/rankings/ppr/top200/

您将如何将这些表中的一个(或全部四个)放入 R 中,大概使用 rvest 包。

r rvest
1个回答
0
投票
library(rvest)

url <- "https://www.cbssports.com/fantasy/football/rankings/ppr/top200/"

# get the html
read_html(url)->html

# create the dataframe
data.frame(
  rank = html_nodes(html, ".rank") |> html_text2(),
  name = html_nodes(html, ".player-name") |> html_text2(),
  team_postion_cost = html_nodes(html, ".team") |> html_text2(),
  bye = html_nodes(html, ".player-stats") |> html_text2()
) %>% 
# add the author column
 {mutate(., authors = rep(authors, each = nrow(.)/length(authors)))} |>

 # separate the team_position_cost column into the separate columns
 separate_wider_delim(team_postion_cost, delim = " ", names = c("team", "position", "cost")) |>
  mutate(
    position = as.factor(position),
    team = as.factor(team),
    cost = as.integer(str_remove(cost, "\\$")),
    bye = as.integer(bye))

输出:

# A tibble: 800 × 7
   rank  name         team  position  cost   bye authors  
   <chr> <chr>        <fct> <fct>    <int> <int> <chr>    
 1 1     J. Jefferson MIN   WR          34    13 Consensus
 2 2     J. Chase     CIN   WR          33     7 Consensus
 3 3     C. McCaffrey SF    RB          34     9 Consensus
 4 4     A. Ekeler    LAC   RB          31     5 Consensus
 5 5     T. Hill      MIA   WR          30    10 Consensus
 6 6     C. Kupp      LAR   WR          30    10 Consensus
 7 7     B. Robinson  ATL   RB          27    11 Consensus
 8 8     S. Diggs     BUF   WR          26    13 Consensus
 9 9     T. Kelce     KC    TE          23    10 Consensus
10 10    S. Barkley   NYG   RB          26    13 Consensus
# ℹ 790 more rows
© www.soinside.com 2019 - 2024. All rights reserved.