如何获取网页剪贴的正确 URL?

问题描述 投票:0回答:1

我制作了一个机器人来报废运动鞋的价格,但我在获取价格时遇到了麻烦,这里有人帮助我找到了请求和报废工作的正确 URL。

产品原始网址:https://www.vans.com.br/tenis-ultrarange-rapidweld-black-white/p/1003500430051U?gad_source=1 网址修改为废品:.com.br/arezzocoocc/v2/vans/products/1003500430051U/dynamic-product-fields?fields=DYNAMIC_FIELDS_PDP 我要修改的新网址:https://www.nike.com.br/tenis-nike-pegasus-40-masculino-025803.html?cor=ID

这是我查找这款运动鞋价格的代码:

import requests
import smtplib
import email.message
import ssl

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:121.0) Gecko/20100101 Firefox/121.0"
}
def get_product_data(url, number):
    try:
        response = requests.get(url, headers=headers)
        response.raise_for_status()
        data = response.json()
        return next((c for c in data.get("colorOptions", []) if c["code"] == number), None)
    except requests.RequestException as e:
        print(f"Error fetching product data for {number}: {e}")
        return None

def send_email(product_name, product_price, receiver_email):
    subject = "Price Drop Alert!"
    body_msg = f'''The price of {product_name} has dropped to {product_price}.'''
    message = f"Subject: {subject}\n\n{body_msg}"

    sender = '<REDACTED>'
    password = '<REDACTED>'
    receiver = '<REDACTED>'

    context = ssl.create_default_context()

    with smtplib.SMTP_SSL('smtp.gmail.com', 465, context=context) as server:
        server.login(sender, password)

        server.sendmail(sender, receiver, message.encode('utf-8'))

number1 = "1003500430051U"
number2 = "1002001070011U"

url1 = f"https://www.vans.com.br/arezzocoocc/v2/vans/products/{number1}/dynamic-product-fields?fields=DYNAMIC_FIELDS_PDP"
url2 = f"https://www.vans.com.br/arezzocoocc/v2/vans/products/{number2}/dynamic-product-fields?fields=DYNAMIC_FIELDS_PDP"

product1 = get_product_data(url1, number1)
product2 = get_product_data(url2, number2)

if product1 and "price" in product1:
    productprice1 = product1["price"]["value"]
    print(product1["name"], productprice1)

if product2 and "price" in product2:
    print(product2["name"], product2["price"]["value"])

try:
    data1 = requests.get(url1, headers=headers).json()
    data2 = requests.get(url2, headers=headers).json()
except requests.RequestException as e:
    print(f"Error fetching product data: {e}")

for c in data1["colorOptions"]:
    if c["code"] == number1:
        productprice1 = data1["price"]["value"]
        print(c["name"], productprice1)
        break

for c in data2["colorOptions"]:
    if c["code"] == number2:
        print(c["name"], data2["price"]["value"])
        break

if productprice1 and productprice1 < 600:
    send_email(product1["name"], productprice1, '[email protected]')

我尝试将新网址设为:https://www.nike.com.br/tenis-nike-pegasus-40-masculino-025803/dynamic-product-fields?fields=DYNAMIC_FIELDS_PDP

但它不起作用,如果有人可以帮助我如何在任何 URL 中获取此内容,这对我的剪贴程序非常有帮助,这样我就可以知道如何在需要时搜索任何产品。

python web-scraping url python-requests
1个回答
0
投票

由于这是一个不同的网站(Nike,不是 Vans) - 您不需要获得授权才能使用该网站吗?

您还向 Vans URL 提供了参数。但不是耐克的。

number1 = "1003500430051U"
number2 = "1002001070011U"
url1 = f"https://www.vans.com.br/arezzocoocc/v2/vans/products/{number1}/dynamic-product-fields?fields=DYNAMIC_FIELDS_PDP"
url2 = f"https://www.vans.com.br/arezzocoocc/v2/vans/products/{number2}/dynamic-product-fields?fields=DYNAMIC_FIELDS_PDP"

您还向 Vans URL 提供了参数。但不是耐克的。

number3 = ?
url3 = f"https://www.nike.com.br/tenis-nike-pegasus-40-masculino-025803.html?cor=ID"

更改此行:

product1 = get_product_data(url1, number1)
product2 = get_product_data(url3, number2)

url3 = f"https://www.nike.com.br/tenis-nike-pegasus-40-masculino-025803.html?cor=ID"

产生错误:

Error fetching product data for 1002001070011U: 403 Client Error: Forbidden for url: https://www.nike.com.br/tenis-nike-pegasus-40-masculino-025803.html?cor=ID
Tennis Ultrarange Rapidweld Black White 549.99
Tennis Old Skool Black White 399.99

这意味着这是一个授权错误。您可能需要 API 密钥或其他形式的 Nike 授权。

© www.soinside.com 2019 - 2024. All rights reserved.