我无法使用selenium和Python从Nike网站提取价格数据有什么原因吗

问题描述 投票:0回答:1

我目前正在构建一个 Python 脚本,该脚本从 Nike 网站获取培训师的价格并将价格推送到 CSV 文件中。最初,代码采用了价格数据所在的元素,但失败后,我转而使用 CSS 选择器,因为 Nike 产品的价格包含在 CSS 中。但是当我运行脚本时,它无法提取价格数据。我被难住了,所以我修改了脚本以考虑在初始页面加载后动态加载到 Javascript 中的元素。但毕竟,我仍然无法将价格和产品推送到 CSV 文件中。我决定修改代码以使用 xPath 提取代码。然而,我仍然无法提取价格数据并将其推送到 CSV 文件。


from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import numpy as np
import pandas as pd
# Get the website using the Chrome web driver
browser = webdriver.Chrome()

# Create an empty DataFrame to store the results
df = pd.DataFrame(columns=["Product", "Price"])

# Define a list of websites to scrape
websites = [
    'https://www.nike.com/gb/t/dunk-low-retro-shoe-Kd1wZr/DD1391-103',
    'https://www.nike.com/gb/t/dunk-low-retro-shoe-QgD9Gv/DD1391-100',
    'https://www.nike.com/gb/t/dunk-low-retro-shoes-p6gmkm/DV0833-400'
]

# Loop through the websites
for website in websites:
    browser.get(website)
    try:
        # Attempt to find the price element by its xPath
        price = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, '//*[@id="PDP"]/div[2]/div/div[4]/div[1]/div/div[2]/div/div/div/div/div')))
        # If found, extract the text and add to the DataFrame
        df = df.append({"Product": website, "Price": price.text}, ignore_index=True)
        print("Price for", website, ":", price.text)
    except:
        # If the element is not found, print an error message
        print("Price not found for", website)

# Close the browser
browser.quit()

# Save data frame data into an Excel CSV file
df.to_csv(r'PriceList.csv', index=False)

该代码应该获取价格数据所在的元素并推送到 CSV 文件中 A visual aid of what the code shoudl do

python css selenium-webdriver xpath web-crawler
1个回答
0
投票
df = df.append({"Product": website, "Price": price.text}, ignore_index=True)

append
已从 pandas 中删除。 相反,您可以使用
_append

df = df._append({"Product": website, "Price": price.text}, ignore_index=True)

由于某种原因,

text
似乎不起作用(在我的计算机上)。 相反,您可以使用
get_attribute("innerHTML")

这是最终的代码。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from chromedriver_py import binary_path
import time
import numpy as np
import pandas as pd

svc = Service(executable_path=binary_path)

# Get the website using the Chrome web driver
browser = webdriver.Chrome(service=svc)

# Create an empty DataFrame to store the results
df = pd.DataFrame(columns=["Product", "Price"])

# Define a list of websites to scrape
websites = [
    'https://www.nike.com/gb/t/dunk-low-retro-shoe-Kd1wZr/DD1391-103',
    'https://www.nike.com/gb/t/dunk-low-retro-shoe-QgD9Gv/DD1391-100',
    'https://www.nike.com/gb/t/dunk-low-retro-shoes-p6gmkm/DV0833-400'
]

# Loop through the websites
for website in websites:
    browser.get(website)
    try:
        # Attempt to find the price element by its xPath
        price = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, '//*[@id="PDP"]/div[2]/div/div[4]/div[1]/div/div[2]/div/div/div/div/div')))

        # If found, extract the text and add to the DataFrame
        df = df._append({"Product": website, "Price": price.get_attribute('innerHTML')}, ignore_index=True)
        print("Price for", website, ":", price.get_attribute('innerHTML'))
    except:
        # If the element is not found, print an error message
        print("Price not found for", website)

# Close the browser
browser.quit()

# Save data frame data into an Excel CSV file
df.to_csv(r'PriceList.csv', index=False)
© www.soinside.com 2019 - 2024. All rights reserved.