我正在尝试使用selenium进行网页抓取,但我无法提取我需要的所有相关信息

问题描述 投票:0回答:2

我是 Selenium 的新手,正在尝试提取页面上的信息。但是我无法提取我需要的所有相关信息。

下面是我的代码示例:

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()


driver.get("https://www.morphmarket.com/all/c/reptiles/pythons/ball-pythons")
time.sleep(5)

snakes = driver.find_elements(By.CSS_SELECTOR,"a.animalCard--avL0R")

link = snakes[0].get_attribute("href")
driver.get(snakes[0].get_attribute("href"))
time.sleep(5)
genes= driver.find_element(By.CSS_SELECTOR, "h1.animalTitle--cH6qE")
print(genes.text)
snake=driver.find_element(By.CSS_SELECTOR,"h2.animalSubTitle--mhYId")
print(snake.text)
price = driver.find_element(By.CSS_SELECTOR, "h1.salePrice--qNIIs")
print(price.text)
#sex = driver.find_element(By.TAG_NAME, "span")
Birth = driver.find_elements(By.CSS_SELECTOR,"div.labelValueContainer--z1CP3")
print(Birth[1].text)
print(Birth[3].text)
print(Birth[4].text)
print(Birth[5].text)
print(Birth[6].text)
print(Birth[7].text)
print(Birth[8].text)
print(Birth[9].text)
print(Birth[10].text)
print(Birth[11].text)

Company= driver.find_element(By.CSS_SELECTOR, "h4.title--qLioF")
print(Company.text)

Location=driver.find_element(By.CSS_SELECTOR,"p.location--TtVtP")
print(Location.text)
membership= driver.find_element(By.CSS_SELECTOR, "span")

我可以提取一些信息,但是如何提取页面上的性别、公司、位置和会员信息?

python selenium-webdriver web-scraping
2个回答
0
投票

您可以使用他们的 Ajax API 来获取结果,例如:

import requests

api_url = "https://www.morphmarket.com/api/v1/listings/"

params = {
    "category": "bps",
    "page": "1",
    "page_size": "24",
    "state": "for_sale",
    "view": "grid",
}

data = requests.get(api_url, params=params).json()
# print(data)

for r in data["results"]:
    print(f"{r['title'][:50]:<50} {r['price']}")

打印:

Sacred                                             850.0
Pied 66% Het Clown                                 450.0
Clown 66% Het Pied                                 450.0
Pastel Enchi Freeway                               500.0
Yellow Belly Het Dg Het Hypo Het Pied              1200.0
2021 0.1 Pastel Chocolate Enchi Desert Ghost Hypo  4800.0
Pastel Stranger 50% Het Clown                      1250.0
Super Banana Enchi Leopard                         800.0
Pastel Stranger 50% Het Clown                      1250.0
Pinstripe Enchi Het Dg Het Hypo Het Pied           400.0
Ultramel Banana 66% Poss Het GeneticStripe         498.0
Black Head Fire Specter                            300.0
Super Pastel Leopard Fire Clown                    450.0
Pastel Leopard Stranger 50% Het Clown              1650.0
Pastel Mahogany Super Redstripe 50% Double Het Clo 1250.0
Pastel Spider Clown 66% Het Axanthic               350.0
Pastel Specter Black Head                          375.0
Orange Dream Cypress Mojave Pastel Probable Fire A 450.0
Coral Glow Hidden Gene Woma Granite Enchi Odium Fa 500.0
Pastel Super Ghi 100% Het Clown                    350.0
Yb Scarecrow                                       5900.0
Cypress Fire Het Clown                             550.0
Ghost Het Pied                                     100.0
Clown                                              300.0

0
投票

Andrej Kesely 的解决方案是快速且理想的解决方案。但是,如果您具体使用硒来刮除它,请参阅下面的硒解决方案。

使用 Explicit Waits 而不是

time.sleep()
检查下面的优化代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Chrome()
driver.get("https://www.morphmarket.com/all/c/reptiles/pythons/ball-pythons")
wait = WebDriverWait(driver,10)

snakes = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"a.animalCard--avL0R")))
link = snakes[0].get_attribute("href")
driver.get(link)

genes = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "h1.animalTitle--cH6qE")))
print(genes.text)

snake = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "h2.animalSubTitle--mhYId")))
print(snake.text)

price = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "h1.salePrice--qNIIs")))
print(price.text)

Birth = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"div.labelValueContainer--z1CP3")))
for birth in Birth:
    print(birth.text)

Company_info = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "(//div[@class='infoWrapper--O_L9E'])[2]")))
for element in Company_info:
    print(element.text)

控制台结果:

Enchi Pinstripe Het Dg Het Hypo Het Pied
Ball Pythons Baby
$1,200.00
Sex:
Traits:
Enchi
Pinstripe
Het Desert Ghost
Het Hypo
Het Piebald
Origin:
Self Produced
Birth:
2022
Weight:
280g
Diet:
Frozen/Thawed Rat
Shipping:
Free
Shipping Details:
Regional Shipping
Animal ID:
23-115-19
First Posted:
12/18/23
Last Renewed:
02/28/24
Last Updated:
02/28/24
ML Exotics
5.0
(126)
Taunton, Massachusetts
Pro Member

Process finished with exit code 0
© www.soinside.com 2019 - 2024. All rights reserved.