使用 rvest 进行网络抓取

问题描述 投票:0回答:1

让我们考虑以下页面:

https://www.whoscored.com/Regions/252/Tournaments/2/Seasons/8618/Stages/19793/Fixtures/England-Premier-League-2021-2022

我想提取到

R
所有比赛。为此,我写了一段代码:

library(rvest)
library(dplyr)
url <- read_html("https://www.whoscored.com/Regions/252/Tournaments/2/Seasons/8618/Stages/19793/Fixtures/England-Premier-League-2021-2022")
names <- url %>% html_nodes(".divtable-row") %>% html_text()
names

在哪里

".divtable-row"
是用选择器小工具提取的。问题是这段代码的输出等于
character(0)
。它以某种方式看不到我正在标记的区域。你知道为什么会这样吗?

r rvest
1个回答
0
投票

你可以考虑这样的事情:

library(RSelenium)
library(rvest)

shell('docker run -d -p 4446:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4446L, browserName = "firefox")
url <- "https://www.whoscored.com/Regions/252/Tournaments/2/Seasons/8618/Stages/19793/Fixtures/England-Premier-League-2021-2022"
remDr$open()
remDr$navigate(url)

web_Obj_Table <- remDr$findElement("id", "tournament-fixture")
text_Table <- web_Obj_Table$getElementText()[[1]]
stringr::str_extract_all(text_Table, "\\d{1,2}:\\d{1,2}FT\\n[:alpha:]*(|[:space:]*[:alpha:]*)\\n\\d{1,2}[:space:]:[:space:]\\d{1,2}\\n[:alpha:]*(|[:space:]*[:alpha:]*)\\nMatch Report")

[[1]]
 [1] "14:00FT\nEverton\n1 : 0\nChelsea\nMatch Report"                  "14:00FT\nTottenham\n3 : 1\nLeicester\nMatch Report"             
 [3] "16:30FT\nWest Ham\n1 : 2\nArsenal\nMatch Report"                 "20:00FT\nManchester United\n3 : 0\nBrentford\nMatch Report"     
 [5] "15:00FT\nChelsea\n2 : 2\nWolverhampton\nMatch Report"            "15:00FT\nBurnley\n1 : 3\nAston Villa\nMatch Report"             
 [7] "15:00FT\nBrentford\n3 : 0\nSouthampton\nMatch Report"            "17:30FT\nBrighton\n4 : 0\nManchester United\nMatch Report"      
 [9] "19:45FT\nLiverpool\n1 : 1\nTottenham\nMatch Report"              "14:00FT\nLeicester\n1 : 2\nEverton\nMatch Report"               
[11] "14:00FT\nNorwich\n0 : 4\nWest Ham\nMatch Report"                 "16:30FT\nManchester City\n5 : 0\nNewcastle\nMatch Report"       
[13] "20:00FT\nAston Villa\n1 : 2\nLiverpool\nMatch Report"            "19:45FT\nLeicester\n3 : 0\nNorwich\nMatch Report"               
[15] "19:45FT\nWatford\n0 : 0\nEverton\nMatch Report"                  "20:15FT\nWolverhampton\n1 : 5\nManchester City\nMatch Report"   
[17] "12:00FT\nTottenham\n1 : 0\nBurnley\nMatch Report"                "14:00FT\nLeeds\n1 : 1\nBrighton\nMatch Report"                  
[19] "14:00FT\nWolverhampton\n1 : 1\nNorwich\nMatch Report"            "14:00FT\nWest Ham\n2 : 2\nManchester City\nMatch Report"        
[21] "14:00FT\nWatford\n1 : 5\nLeicester\nMatch Report"                "14:00FT\nAston Villa\n1 : 1\nCrystal Palace\nMatch Report"      
[23] "20:00FT\nNewcastle\n2 : 0\nArsenal\nMatch Report"                "19:45FT\nSouthampton\n1 : 2\nLiverpool\nMatch Report"           
[25] "19:45FT\nEverton\n3 : 2\nCrystal Palace\nMatch Report"           "20:00FT\nChelsea\n1 : 1\nLeicester\nMatch Report"               
[27] "16:00FT\nArsenal\n5 : 1\nEverton\nMatch Report"                  "16:00FT\nBrighton\n3 : 1\nWest Ham\nMatch Report"               
[29] "16:00FT\nBurnley\n1 : 2\nNewcastle\nMatch Report"                "16:00FT\nChelsea\n2 : 1\nWatford\nMatch Report"                 
[31] "16:00FT\nCrystal Palace\n1 : 0\nManchester United\nMatch Report" "16:00FT\nLeicester\n4 : 1\nSouthampton\nMatch Report"           
[33] "16:00FT\nLiverpool\n3 : 1\nWolverhampton\nMatch Report"          "16:00FT\nManchester City\n3 : 2\nAston Villa\nMatch Report"     
[35] "16:00FT\nNorwich\n0 : 5\nTottenham\nMatch Report"
© www.soinside.com 2019 - 2024. All rights reserved.