使用 Rselenium 在选项卡内抓取数据

问题描述 投票:0回答:1

我想在以下页面中抓取玩家统计数据下的汇总表: https://www.sofascore.com/southampton-wolverhampton/dsV

我正在尝试为此目的使用 RSelenium

到目前为止,这是我的代码:

   rm=rsDriver(browser = "chrome", chromever ="111.0.5563.64",
                verbose = F,
                port = free_port())
    
    rmDr=rm$client
    rmDr$open()
    rmDr$navigate("https://www.sofascore.com/southampton-wolverhampton/dsV")
    elem <- rmDr$findElement(using = 'xpath', '//button[@data-tabid="summary"]')

当我点击按钮摘要时出现摘要数据。因此,我使用 xpath 如上所述提取该按钮。但它没有用。

你能建议任何替代方法吗?

谢谢。

这是我得到的错误:

Selenium message:no such element: Unable to locate element: {"method":"xpath","selector":"//button[@data-tabid="summary"]"}
  (Session info: chrome=111.0.5563.65)
For documentation on this error, please visit: https://www.seleniumhq.org/exceptions/no_such_element.html
Build info: version: '4.0.0-alpha-2', revision: 'f148142cf8', time: '2019-07-01T21:30:10'
System info: host: 'DESKTOP-MOGN5AG', ip: '192.168.0.114', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '19.0.2'
Driver info: driver.version: unknown

Error:   Summary: NoSuchElement
     Detail: An element could not be located on the page using the given search parameters.
     class: org.openqa.selenium.NoSuchElementException
     Further Details: run errorDetails method

截图如下:

r selenium-webdriver web-scraping rselenium
1个回答
0
投票

我使用这个点击了摘要选项卡

remDr$findElement(using = "css",value = ".fircAT > div:nth-child(2)")$clickElement()

然后页面切换tab后,拉取页面的html,然后搜索table节点。 这是完整的代码:

# load libraries
library(RSelenium)
library(rvest)
library(magrittr)

# define target url
url <- "https://www.sofascore.com/southampton-wolverhampton/dsV"


# start RSelenium ------------------------------------------------------------

rD <- rsDriver(browser="firefox", port=4550L, chromever = NULL)
remDr <- rD[["client"]]

# open the remote driver-------------------------------------------------------
remDr$open()

# Navigate to webpage -----------------------------------------------------
remDr$navigate(url)


# click on the summary tab ------------------------------------
remDr$findElement(using = "css",value = ".fircAT > div:nth-child(2)")$clickElement()



# pull the webpage html
# then read it
page_html <- remDr$getPageSource()[[1]] %>% 
  read_html() 



# find table elements
tables <- page_html %>% html_table()

summary_stats_table <- tables[[1]]

这是它的样子:

summary_stats_table
# A tibble: 32 × 12
   ``    `+`    Goals Assists Tackles Acc. …¹ Duels…² Groun…³ Aeria…⁴ Minut…⁵ Posit…⁶
   <lgl> <chr>  <int>   <int>   <int> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
 1 NA    Moham…     0       0       4 22/32 … 11 (7)  6 (4)   5 (3)   90'     D      
 2 NA    Jan B…     0       0       2 19/30 … 6 (6)   2 (2)   4 (4)   90'     D      
 3 NA    Adama…     0       0       1 11/19 … 11 (7)  11 (7)  0 (0)   45'     F      
 4 NA    Craig…     0       0       1 54/61 … 12 (7)  4 (3)   8 (4)   90'     D      
 5 NA    João …     1       0       1 8/11 (… 7 (2)   5 (2)   2 (0)   20'     M      
 6 NA    Ainsl…     0       0       4 24/36 … 10 (9)  7 (6)   3 (3)   90'     D      
 7 NA    James…     0       0       0 35/42 … 8 (4)   5 (1)   3 (3)   90'     M      
 8 NA    João …     0       0       3 10/12 … 4 (3)   4 (3)   0 (0)   45'     M      
 9 NA    Carlo…     1       0       1 22/26 … 14 (4)  13 (4)  1 (0)   79'     M      
10 NA    Hugo …     0       0       1 19/20 … 5 (2)   3 (2)   2 (0)   45'     D      
# … with 22 more rows, 1 more variable: Rating <dbl>, and abbreviated variable names
#   ¹​`Acc. passes`, ²​`Duels (won)`, ³​`Ground duels (won)`, ⁴​`Aerial duels (won)`,
#   ⁵​`Minutes played`, ⁶​Position
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
© www.soinside.com 2019 - 2024. All rights reserved.