我目前正在构建一个 Python 脚本,该脚本从 Nike 网站获取培训师的价格并将价格推送到 CSV 文件中。最初,代码采用了价格数据所在的元素,但失败后,我转而使用 CSS 选择器,因为 Nike 产品的价格包含在 CSS 中。但是当我运行脚本时,它无法提取价格数据。我被难住了,所以我修改了脚本以考虑在初始页面加载后动态加载到 Javascript 中的元素。但毕竟,我仍然无法将价格和产品推送到 CSV 文件中。我决定修改代码以使用 xPath 提取代码。然而,我仍然无法提取价格数据并将其推送到 CSV 文件。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import numpy as np
import pandas as pd
# Get the website using the Chrome web driver
browser = webdriver.Chrome()
# Create an empty DataFrame to store the results
df = pd.DataFrame(columns=["Product", "Price"])
# Define a list of websites to scrape
websites = [
'https://www.nike.com/gb/t/dunk-low-retro-shoe-Kd1wZr/DD1391-103',
'https://www.nike.com/gb/t/dunk-low-retro-shoe-QgD9Gv/DD1391-100',
'https://www.nike.com/gb/t/dunk-low-retro-shoes-p6gmkm/DV0833-400'
]
# Loop through the websites
for website in websites:
browser.get(website)
try:
# Attempt to find the price element by its xPath
price = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, '//*[@id="PDP"]/div[2]/div/div[4]/div[1]/div/div[2]/div/div/div/div/div')))
# If found, extract the text and add to the DataFrame
df = df.append({"Product": website, "Price": price.text}, ignore_index=True)
print("Price for", website, ":", price.text)
except:
# If the element is not found, print an error message
print("Price not found for", website)
# Close the browser
browser.quit()
# Save data frame data into an Excel CSV file
df.to_csv(r'PriceList.csv', index=False)
df = df.append({"Product": website, "Price": price.text}, ignore_index=True)
append
已从 pandas 中删除。
相反,您可以使用 _append
。
df = df._append({"Product": website, "Price": price.text}, ignore_index=True)
由于某种原因,
text
似乎不起作用(在我的计算机上)。
相反,您可以使用 get_attribute("innerHTML")
这是最终的代码。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from chromedriver_py import binary_path
import time
import numpy as np
import pandas as pd
svc = Service(executable_path=binary_path)
# Get the website using the Chrome web driver
browser = webdriver.Chrome(service=svc)
# Create an empty DataFrame to store the results
df = pd.DataFrame(columns=["Product", "Price"])
# Define a list of websites to scrape
websites = [
'https://www.nike.com/gb/t/dunk-low-retro-shoe-Kd1wZr/DD1391-103',
'https://www.nike.com/gb/t/dunk-low-retro-shoe-QgD9Gv/DD1391-100',
'https://www.nike.com/gb/t/dunk-low-retro-shoes-p6gmkm/DV0833-400'
]
# Loop through the websites
for website in websites:
browser.get(website)
try:
# Attempt to find the price element by its xPath
price = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, '//*[@id="PDP"]/div[2]/div/div[4]/div[1]/div/div[2]/div/div/div/div/div')))
# If found, extract the text and add to the DataFrame
df = df._append({"Product": website, "Price": price.get_attribute('innerHTML')}, ignore_index=True)
print("Price for", website, ":", price.get_attribute('innerHTML'))
except:
# If the element is not found, print an error message
print("Price not found for", website)
# Close the browser
browser.quit()
# Save data frame data into an Excel CSV file
df.to_csv(r'PriceList.csv', index=False)