[如何在Chrome中单击“检查”，使Beautiful soup html解析器与显示的代码相同？

Question

因此，基本上，我正在尝试构建一个网络刮板，以在速卖通网站上找到产品的评论。但是，当我解析html代码时，解析后的代码与我在Chrome的“检查”窗口中看到的代码不同。我无法在解析的代码中找到评论部分。我如何能够完全按照在检查窗口中看到的代码来解析代码？

from bs4 import BeautifulSoup as soup  # HTML data structure
from urllib.request import urlopen as uReq  # Web client

# URl to web scrap from.
page_url = "https://www.aliexpress.com/item/4000042292255.html? 
spm=a2g0o.productlist.0.0.4a253632RWxaLa&algo_pvid=c73bf552-ce47-43f6-9abb- 
b4a994eeaa01&algo_expid=c73bf552-ce47-43f6-9abb-b4a994eeaa01-0&btsid=2c594979-4027-410a-a7a4- 
7246ce06ade7&ws_ab_test=searchweb0_0,searchweb201602_7,searchweb201603_53"

# opens the connection and downloads html page from url
uClient = uReq(page_url)

# parses html into a soup data structure to traverse html
# as if it were a json data type.
page_soup = soup(uClient.read(), "html.parser")
uClient.close()

Answer 1

它是动态生成的，您可以通过渲染对其进行爬网。这是simple_scrapy和pyppeteer的示例。

from simplified_html.request_render import RequestRender
req = RequestRender({ 'executablePath': '/Applications/chrome.app/Contents/MacOS/Google Chrome'})
def callback(html,url,data):
  from simplified_scrapy.simplified_doc import SimplifiedDoc 
  doc = SimplifiedDoc(html)
  print (doc.title)
req.get('https://www.aliexpress.com/item/4000042292255.html?spm=a2g0o.productlist.0.0.4a253632RWxaLa&algo_pvid=c73bf552-ce47-43f6-9abb-b4a994eeaa01&algo_expid=c73bf552-ce47-43f6-9abb-b4a994eeaa01-0&btsid=2c594979-4027-410a-a7a4-7246ce06ade7&ws_ab_test=searchweb0_0,searchweb201602_7,searchweb201603_53',callback)

结果：

{'tag': 'title', 'html': 'Note 7 pro smartphones 4G LTE celulares 4GB RAM 64GB ROM quad core 13MP camera 18:9 IPS Android mobile phones face ID unlocked-in Cellphones from Cellphones &amp; Telecommunications on AliExpress'}

您可以获取简化的示例here

[如何在Chrome中单击“检查”，使Beautiful soup html解析器与显示的代码相同？

问题描述投票：0回答：1

1个回答

最新问题

[如何在Chrome中单击“检查”，使Beautiful soup html解析器与显示的代码相同？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1