我想在以下页面中抓取玩家统计数据下的汇总表: https://www.sofascore.com/southampton-wolverhampton/dsV
我正在尝试为此目的使用 RSelenium
到目前为止,这是我的代码:
rm=rsDriver(browser = "chrome", chromever ="111.0.5563.64",
verbose = F,
port = free_port())
rmDr=rm$client
rmDr$open()
rmDr$navigate("https://www.sofascore.com/southampton-wolverhampton/dsV")
elem <- rmDr$findElement(using = 'xpath', '//button[@data-tabid="summary"]')
当我点击按钮摘要时出现摘要数据。因此,我使用 xpath 如上所述提取该按钮。但它没有用。
你能建议任何替代方法吗?
谢谢。
这是我得到的错误:
Selenium message:no such element: Unable to locate element: {"method":"xpath","selector":"//button[@data-tabid="summary"]"}
(Session info: chrome=111.0.5563.65)
For documentation on this error, please visit: https://www.seleniumhq.org/exceptions/no_such_element.html
Build info: version: '4.0.0-alpha-2', revision: 'f148142cf8', time: '2019-07-01T21:30:10'
System info: host: 'DESKTOP-MOGN5AG', ip: '192.168.0.114', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '19.0.2'
Driver info: driver.version: unknown
Error: Summary: NoSuchElement
Detail: An element could not be located on the page using the given search parameters.
class: org.openqa.selenium.NoSuchElementException
Further Details: run errorDetails method
我使用这个点击了摘要选项卡
remDr$findElement(using = "css",value = ".fircAT > div:nth-child(2)")$clickElement()
然后页面切换tab后,拉取页面的html,然后搜索table节点。 这是完整的代码:
# load libraries
library(RSelenium)
library(rvest)
library(magrittr)
# define target url
url <- "https://www.sofascore.com/southampton-wolverhampton/dsV"
# start RSelenium ------------------------------------------------------------
rD <- rsDriver(browser="firefox", port=4550L, chromever = NULL)
remDr <- rD[["client"]]
# open the remote driver-------------------------------------------------------
remDr$open()
# Navigate to webpage -----------------------------------------------------
remDr$navigate(url)
# click on the summary tab ------------------------------------
remDr$findElement(using = "css",value = ".fircAT > div:nth-child(2)")$clickElement()
# pull the webpage html
# then read it
page_html <- remDr$getPageSource()[[1]] %>%
read_html()
# find table elements
tables <- page_html %>% html_table()
summary_stats_table <- tables[[1]]
这是它的样子:
summary_stats_table
# A tibble: 32 × 12
`` `+` Goals Assists Tackles Acc. …¹ Duels…² Groun…³ Aeria…⁴ Minut…⁵ Posit…⁶
<lgl> <chr> <int> <int> <int> <chr> <chr> <chr> <chr> <chr> <chr>
1 NA Moham… 0 0 4 22/32 … 11 (7) 6 (4) 5 (3) 90' D
2 NA Jan B… 0 0 2 19/30 … 6 (6) 2 (2) 4 (4) 90' D
3 NA Adama… 0 0 1 11/19 … 11 (7) 11 (7) 0 (0) 45' F
4 NA Craig… 0 0 1 54/61 … 12 (7) 4 (3) 8 (4) 90' D
5 NA João … 1 0 1 8/11 (… 7 (2) 5 (2) 2 (0) 20' M
6 NA Ainsl… 0 0 4 24/36 … 10 (9) 7 (6) 3 (3) 90' D
7 NA James… 0 0 0 35/42 … 8 (4) 5 (1) 3 (3) 90' M
8 NA João … 0 0 3 10/12 … 4 (3) 4 (3) 0 (0) 45' M
9 NA Carlo… 1 0 1 22/26 … 14 (4) 13 (4) 1 (0) 79' M
10 NA Hugo … 0 0 1 19/20 … 5 (2) 3 (2) 2 (0) 45' D
# … with 22 more rows, 1 more variable: Rating <dbl>, and abbreviated variable names
# ¹`Acc. passes`, ²`Duels (won)`, ³`Ground duels (won)`, ⁴`Aerial duels (won)`,
# ⁵`Minutes played`, ⁶Position
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names