我无法抓取速卖通页面上的所有产品

Question

我正在尝试使用以下代码抓取 Aliexpress 页面上的所有产品，但它只返回 10 个第一产品。

当我希望它返回所有产品时，我尝试使用以下代码，因为 CSS 选择器选择了所有产品名称。

AlPage <- "https://www.aliexpress.com/w/wholesale-running-shoes.html?SearchText=running+shoes&catId=0&g=n&initiative_id=SB_20230318171033&sortType=total_tranpro_desc&spm=a2g0o.home.1000002.0&trafficChannel=main"

url<-read_html(AlPage)

print(url)

alproduct_name<-html_nodes(url,".manhattan--title--24F0J-G, .cards--title--2rMisuY") %>% html_text2()
alproduct_name

我还检查了所有产品的类名，因为我认为它们可能有不同的类名，但它们都是一样的。

Answer 1

我怀疑初始网页只显示前十个结果，然后在用户向下滚动时动态加载剩余结果，因此您无法使用

rvest

轻松完成此操作。这是使用

RSelenium

的方法：

我也把html节点改成了h1。你找到的节点对我不起作用，但 h1 仍然从这个页面中提取鞋子名称。

# define url ---------------------------------------------------------
url <- "https://www.aliexpress.com/w/wholesale-running-shoes.html?SearchText=running+shoes&catId=0&g=n&initiative_id=SB_20230318171033&sortType=total_tranpro_desc&spm=a2g0o.home.1000002.0&trafficChannel=main"


# start RSelenium ------------------------------------------------------------

rD <- rsDriver(browser="firefox", port=4548L, chromever = NULL)
remDr <- rD[["client"]]

# Navigate to webpage -----------------------------------------------------
remDr$navigate(url)


# scroll to bottom of the page to load all the results
webElem <- remDr$findElement("css", "body")
webElem$sendKeysToElement(list(key = "end"))

# pull page html
html <- remDr$getPageSource()[[1]]

# Use Rvest to read the webpage
AlPage <-html %>% read_html()


# scan the webpage for the h1 node and pull the text associated with that node

alproduct_name <- AlPage %>% 
                  html_nodes("h1") %>% 
                  html_text2()

alproduct_name

应该是46个结果吧？

我无法抓取速卖通页面上的所有产品

问题描述投票：0回答：1

1个回答

最新问题

我无法抓取速卖通页面上的所有产品

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1