我如何使用rvest来获取一个网站的完整URL?

问题描述 投票:0回答:1

我想使用 rvest 来获取网站上一些链接的完整网址。 当我搜刮这些链接时,我得到的是一个缩略版的URL。

我如何获得完整的URL?

这里有一个例子。

library(rvest)
#> Loading required package: xml2

page <- read_html("http://developer.cbssports.com/documentation/api/files/history/standings/breakdown")

urls <- page %>% 
  html_nodes(".MFile") %>%
  html_nodes("a") %>%
  html_attr("href")

urls
#>  [1] "../../draft-config"                        
#>  [2] "../../draft-order"                         
#>  [3] "../../draft-results"                       
#>  [4] "../../owners"                              
#>  [5] "../../fantasy-points"                      
#>  [6] "../../teams"                               
#>  [7] "../awards"                                 
#>  [8] "../championships"                          
#>  [9] "../draft-results"                          
#> [10] "../draft-stats"                            
#> [11] "../league-records"                         
#> [12] "../league-years"                           
#> [13] "../results"                                
#> [14] "../rosters"                                
#> [15] "../team-records"                           
#> [16] "../teams"                                  
#> [17] "../transaction-list"                       
#> [18] "../vs-opponent"                            
#> [19] "overall"                                   
#> [20] "power"                                     
#> [21] "../../dates"                               
#> [22] "../../league-details"                      
#> [23] "../../league-stats"                        
#> [24] "../../playoff-bracket"                     
#> [25] "../../playoff-settings"                    
#> [26] "../../positions"                           
#> [27] "../../pro-teams"                           
#> [28] "../../rosters"                             
#> [29] "../../rules"                               
#> [30] "../../schedules"                           
#> [31] "../../sports"                              
#> [32] "../../stats"                               
#> [33] "../../fantasy-points/weekly-scoring"       
#> [34] "../../news/headlines"                      
#> [35] "../../league-news/headlines"               
#> [36] "../../players/average-draft-position"      
#> [37] "../../players/inactives"                   
#> [38] "../../players/auction-values"              
#> [39] "../../players/gamelog"                     
#> [40] "../../players/injuries"                    
#> [41] "../../players/list"                        
#> [42] "../../players/minors"                      
#> [43] "../../players/outlook"                     
#> [44] "../../players/outlooks"                    
#> [45] "../../players/profile"                     
#> [46] "../../players/rankings"                    
#> [47] "../../players/search"                      
#> [48] "../../players/updates"                     
#> [49] "../../players/probable-pitchers"           
#> [50] "../../players/roster-trends/most-activated"
#> [51] "../../players/roster-trends/most-added"    
#> [52] "../../players/roster-trends/most-benched"  
#> [53] "../../players/roster-trends/most-dropped"  
#> [54] "../../players/roster-trends/most-owned"    
#> [55] "../../players/roster-trends/most-started"  
#> [56] "../../players/roster-trends/most-traded"   
#> [57] "../../players/roster-trends/most-viewed"   
#> [58] "../../players/scout-team"                  
#> [59] "../../players/two-start-pitchers"          
#> [60] "../../scoring/live"                        
#> [61] "../../scoring/preview"                     
#> [62] "../../scoring/categories"                  
#> [63] "../../scoring/rules"                       
#> [64] "../../standings/breakdown"                 
#> [65] "../../standings/by-period"                 
#> [66] "../../standings/overall"                   
#> [67] "../../standings/power"                     
#> [68] "../../stats/batter-vs-pitcher"             
#> [69] "../../stats/defense-vs-position"           
#> [70] "../../stats/situational-stats"             
#> [71] "../../stats/categories"                    
#> [72] "../../news/story"                          
#> [73] "../../league-news/story"                   
#> [74] "../../transaction-list/add-drops"          
#> [75] "../../transaction-list/trades"             
#> [76] "../../transaction-list/log"                
#> [77] "../../transactions/add-drop"               
#> [78] "../../transactions/lineup"                 
#> [79] "../../transactions/trade"                  
#> [80] "../../transactions/waiver-order"           
#> [81] "../../wildcards"

以第一个结果为例

该链接的完整URL是 http:/developer.cbssports.comdocumentationapifilesdraft-config。. 似乎我在搜刮时只得到了URL的末尾。

r web-scraping rvest
1个回答
1
投票

你可以使用 xml2::url_absolute :

main_url <- "http://developer.cbssports.com/documentation/api/files/history/standings/breakdown"
xml2::url_absolute(urls, main_url)
#> [1] "http://developer.cbssports.com/documentation/api/files/draft-config"                        
#>  [2] "http://developer.cbssports.com/documentation/api/files/draft-order"                         
#>  [3] "http://developer.cbssports.com/documentation/api/files/draft-results"                       
#>  [4] "http://developer.cbssports.com/documentation/api/files/owners"                              
#>  [5] "http://developer.cbssports.com/documentation/api/files/fantasy-points"                      
#>  [6] "http://developer.cbssports.com/documentation/api/files/teams"                               
#>  [7] "http://developer.cbssports.com/documentation/api/files/history/awards"                      
#>  [8] "http://developer.cbssports.com/documentation/api/files/history/championships"               
#>  [9] "http://developer.cbssports.com/documentation/api/files/history/draft-results"               
#> [10] "http://developer.cbssports.com/documentation/api/files/history/draft-stats"                 
#> [11] "http://developer.cbssports.com/documentation/api/files/history/league-records"              
#> [12] "http://developer.cbssports.com/documentation/api/files/history/league-years"                
#> [13] "http://developer.cbssports.com/documentation/api/files/history/results"                     
#> [14] "http://developer.cbssports.com/documentation/api/files/history/rosters"                     
#> [15] "http://developer.cbssports.com/documentation/api/files/history/team-records"                
#> [16] "http://developer.cbssports.com/documentation/api/files/history/teams"                       
#> [17] "http://developer.cbssports.com/documentation/api/files/history/transaction-list"            
#> [18] "http://developer.cbssports.com/documentation/api/files/history/vs-opponent"                 
#> [19] "http://developer.cbssports.com/documentation/api/files/history/standings/overall"           
#> [20] "http://developer.cbssports.com/documentation/api/files/history/standings/power"             
#> [21] "http://developer.cbssports.com/documentation/api/files/dates"                               
#> [22] "http://developer.cbssports.com/documentation/api/files/league-details"                      
#> [23] "http://developer.cbssports.com/documentation/api/files/league-stats"                        
#> [24] "http://developer.cbssports.com/documentation/api/files/playoff-bracket"                     
#> ... etc 
© www.soinside.com 2019 - 2024. All rights reserved.