尝试从Python的newegg抓取价格信息时遇到麻烦

问题描述 投票:0回答:1

我尝试使用beautifulsoup来获取有关newegg的价格信息,但没有走运。我尝试使用下面的代码,试图将其返回笔记本电脑的价格1268。

import requests
from bs4 import BeautifulSoup

data = requests.get('https://www.newegg.com/p/1XV-000E-00331?Description=MUHN2LL%2fA&cm_re=MUHN2LL%2fA-_-1XV-000E-00331-_-Product')
soup = BeautifulSoup(data.content, 'lxml')
price = soup.select_one('[itemprop=price]')['content']
print(price)

有人可以帮助我退还1268吗?

html python-3.x web-scraping beautifulsoup price
1个回答
0
投票

您想要的目标已加载JavaScript,因此bs4requests模块将无法渲染JS

但是这是一个解决方案。

所有产品页面都包含一个稳定的字符串,它是:

Compare offers from more sellers as low as $1,268.90 plus shipping

因此,我们将regex,您也可以在其他任何页面中应用它。

import requests
import re

params = {
    "Description": "MUHN2LL/A",
    "cm_re": "MUHN2LL/A-_-1XV-000E-00331-_-Product"
}


def main(url):
    r = requests.get(url, params=params)
    match = re.search(r'low as.+\$(.+\d)', r.text).group(1)
    print(match)


main("https://www.newegg.com/p/1XV-000E-00331")

输出:

1,268.90

还有其他丑陋的想法,您可以在其中解析JSONP编码的代码:

类似于以下内容:

import requests
from bs4 import BeautifulSoup
import re

params1 = {
    "Description": "MUHN2LL/A",
    "cm_re": "MUHN2LL/A-_-1XV-000E-00331-_-Product"
}

params2 = {
    "FirstCall": "true",
    "PageNum": "1",
    "TabType": "0",
    "FilterBy": "",
    "SortBy": "0",
    "action": "Biz.Product.MoreBuyingOptions.JsonpCallBack"
}


def main(url):
    with requests.Session() as req:
        r = req.get(url, params=params1)
        soup = BeautifulSoup(r.content, 'html.parser')
        params2['ParentItem'] = soup.find(
            "input", id="mboParentItemNumber").get("value")
        params2['MappingId'] = soup.find(
            "input", id="mboMappingId").get("value")
        r = req.get(
            "https://www.newegg.com/Common/Ajax/LoadMoreBuyingOption.aspx", params=params2)
        match = [item.group(1, 2) for item in re.finditer(
            r'price-current-label.+?\>(\d.+?)<.+?p>(.+?)<', r.text)][-1]
        print(match)


main("https://www.newegg.com/p/1XV-000E-00331")

输出:

('1,268', '.90')
© www.soinside.com 2019 - 2024. All rights reserved.