[如何在Chrome中单击“检查”,使Beautiful soup html解析器与显示的代码相同?

问题描述 投票:0回答:1

因此,基本上,我正在尝试构建一个网络刮板,以在速卖通网站上找到产品的评论。但是,当我解析html代码时,解析后的代码与我在Chrome的“检查”窗口中看到的代码不同。我无法在解析的代码中找到评论部分。我如何能够完全按照在检查窗口中看到的代码来解析代码?

from bs4 import BeautifulSoup as soup  # HTML data structure
from urllib.request import urlopen as uReq  # Web client

# URl to web scrap from.
page_url = "https://www.aliexpress.com/item/4000042292255.html? 
spm=a2g0o.productlist.0.0.4a253632RWxaLa&algo_pvid=c73bf552-ce47-43f6-9abb- 
b4a994eeaa01&algo_expid=c73bf552-ce47-43f6-9abb-b4a994eeaa01-0&btsid=2c594979-4027-410a-a7a4- 
7246ce06ade7&ws_ab_test=searchweb0_0,searchweb201602_7,searchweb201603_53"

# opens the connection and downloads html page from url
uClient = uReq(page_url)

# parses html into a soup data structure to traverse html
# as if it were a json data type.
page_soup = soup(uClient.read(), "html.parser")
uClient.close()
python-3.x beautifulsoup html-parsing
1个回答
0
投票

它是动态生成的,您可以通过渲染对其进行爬网。这是simple_scrapy和pyppeteer的示例。

from simplified_html.request_render import RequestRender
req = RequestRender({ 'executablePath': '/Applications/chrome.app/Contents/MacOS/Google Chrome'})
def callback(html,url,data):
  from simplified_scrapy.simplified_doc import SimplifiedDoc 
  doc = SimplifiedDoc(html)
  print (doc.title)
req.get('https://www.aliexpress.com/item/4000042292255.html?spm=a2g0o.productlist.0.0.4a253632RWxaLa&algo_pvid=c73bf552-ce47-43f6-9abb-b4a994eeaa01&algo_expid=c73bf552-ce47-43f6-9abb-b4a994eeaa01-0&btsid=2c594979-4027-410a-a7a4-7246ce06ade7&ws_ab_test=searchweb0_0,searchweb201602_7,searchweb201603_53',callback)

结果:

{'tag': 'title', 'html': 'Note 7 pro smartphones 4G LTE celulares 4GB RAM 64GB ROM quad core 13MP camera 18:9 IPS Android mobile phones face ID unlocked-in Cellphones from Cellphones & Telecommunications on AliExpress'}

您可以获取简化的示例here

© www.soinside.com 2019 - 2024. All rights reserved.