Rvest 不会从 <span>

问题描述 投票:0回答:1

我正在尝试从亚马逊上获取价格。它以前可以工作,但现在不行了,我不知道他们是否实施了一些保护,或者我是否没有正确使用

rvest

我正在尝试使用这段代码:

library(rvest)

my_url <- "https://www.amazon.com/s?k=reusable+straws"
user_agent <- user_agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:120.0) Gecko/20100101 Firefox/120")
my_session <- session(my_url, user_agent)

my_session %>%
  html_elements(".a-offscreen")

我可以很好地刮上面的

<a class>
,也可以很好地刮下面的
<span class="a-size-base a-color-secondary">
,但没有一个价格跨度。

有什么想法吗?

r rvest
1个回答
0
投票

考虑使用 SelectorGadget 等工具来更好地识别要抓取的正确 HTML 元素。

library(tidyverse)
library(rvest)

"https://www.amazon.com/s?k=reusable+straws" %>% 
  read_html() %>% 
  html_elements(".puis-card-border") %>% # Select each product box
  map_dfr(~ tibble( # Map over every box to extract info
    title = html_element(.x, ".a-color-base.a-text-normal") %>% 
      html_text2(), 
    price = html_element(.x, ".a-price") %>% 
      html_text2(), 
    rating = html_element(.x, ".aok-align-bottom") %>% 
      html_text2()
  ))

# A tibble: 60 x 3
   title                                               price rating
   <chr>                                               <chr> <chr> 
 1 "HSHIJYA 18 Pack Reusable Stainless Steel Straws w~ $18.~ 4.7 o~
 2 "Piteno\u00ae 16-Pack Reusable Glass Straws, Clear~ $6.9~ 4.7 o~
 3 "Softy Straws Premium Reusable Stainless Steel Dri~ $12.~ 4.7 o~
 4 "15 FITS ALL TUMBLERS STRAWS - Reusable Silicone S~ $14.~ 4.6 o~
 5 "Tronco Set of 6 Stainless Steel Reusable Metal St~ $9.9~ 4.6 o~
 6 "Hiware 12-Pack Reusable Stainless Steel Metal Str~ $6.2~ 4.8 o~
 7 "24 PCS, Reusable Straws with 4 Brushes, 10.5\" Lo~ $5.9~ 4.6 o~
 8 "Kynup Reusable Straws, 4Pack Collapsible Portable~ $9.9~ 4.6 o~
 9 "Ello Impact Reusable Hard Plastic Straws with Cle~ $3.4~ 4.7 o~
10 "ALINK 10.5 in Long Rainbow Colored Reusable Trita~ $4.9~ 4.7 o~
# i 50 more rows
# i Use `print(n = ...)` to see more rows
© www.soinside.com 2019 - 2024. All rights reserved.