使用 Selenium (Python) 定位不同父 div 的子元素时,多个元素保持相同

问题描述 投票:0回答:1

我正在尝试从房地产网站上的列表中抓取数据,并将值(例如价格、地址等)存储在一个对象中(每个列表一个)。为此,我首先找到所有父 div 元素,将它们存储在列表中,然后在迭代时查找每个单独的子元素。

我不确定为什么,但对于最后两个元素,所有对象的值都是相同的(即使它们在网站上不同),这可能意味着 Selenium 为所有对象定位相同的元素,即使我是尝试在迭代中仅定位当前父元素的子元素。

我尝试搜索在 chrome 开发工具中使用的 XPATH,发现它找到了我尝试单独定位的所有元素,这意味着这(很可能)不可能是由于存在问题而导致的使用我的 XPATH 语法。

没有引发 staleElement 或其他异常。

WebDriverWait(driver, 10, ignored_exceptions=ignored_exceptions).until(EC.presence_of_all_elements_located((By.XPATH, "//div[@class='list-item-content']")))
offer_listings = driver.find_elements(By.XPATH, "//div[@class='list-item-content']")

for offer in offer_listings: 
    address = offer.find_element(By.XPATH, ".//span[@class='address-part ng-binding']").text
    typology = offer.find_element(By.XPATH, ".//span[@translate='aantalMedebewonersLabel' or @ng-        if='::object.isZelfstandig']").text
    price = offer.find_element(By.XPATH, ".//span[@class='prijs ng-binding ng-scope']").text
        subsidy_possible = offer.find_element(By.XPATH, "./span[@ng-bind-html='::object.huurtoeslagVoorwaarde.localizedIconText']").text
    available_from = offer.find_element(By.XPATH, ".//span[@ng-if='::object.availableFromDate']").text
        available_until = offer.find_element(By.XPATH, ".//div[@class='header-informatie ng-binding ng-scope']").text 
        reactions = offer.find_element(By.XPATH, ".//span[@ng-if='object.numberOfReactions !== null']").text

我尝试通过为我无法定位的元素添加多个属性和值来使我的 XPATH 更加具体。

我原以为 Selenium 只会搜索父 div 中的 XPATH,但显然这不是它所做的,因为它不会具有与其他列表中的元素相同的值。

python selenium-webdriver web-scraping selenium-chromedriver
1个回答
0
投票

我刚刚对现有代码进行了一些调整,并能够按预期提取所有细节。使用列表和哈希来存储提取的值。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
import time

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')

url = "https://www.room.nl/en/offerings/to-rent#?gesorteerd-op=prijs%2B&toekenning=3&locatie=Regio%2BAmsterdam"
driver = webdriver.Chrome(options=chrome_options)
driver.get(url)

WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//div[@class='list-item-content']")))
offer_listings = driver.find_elements(By.XPATH, "//div[@class='list-item-content']")
time.sleep(5)
offers_list = []
for offer in offer_listings: 
    info_hash = {}
    info_hash["address"] = offer.find_element(By.XPATH, ".//span[@class='address-part ng-binding']").text
    info_hash["typology"] = offer.find_element(By.XPATH, ".//span[@translate='aantalMedebewonersLabel' or @ng-if='::object.isZelfstandig']").text
    info_hash["price"] = offer.find_element(By.XPATH, ".//span[@class='prijs ng-binding ng-scope']").text
    info_hash["subsidy_possible"] = offer.find_element(By.XPATH, ".//span[@ng-bind-html='::object.huurtoeslagVoorwaarde.localizedIconText']").text
    info_hash["available_from"] = offer.find_element(By.XPATH, ".//span[@ng-if='::object.availableFromDate']").text
    info_hash["available_until"] = offer.find_element(By.XPATH, ".//div[@class='header-informatie ng-binding ng-scope']").text 
    info_hash["reactions"] = offer.find_element(By.XPATH, ".//span[@ng-if='object.numberOfReactions !== null']").text
    offers_list.append(info_hash)

print(offers_list)
driver.quit()

打印:

[{'address': 'Uilenstede 216-1508', 'typology': '14 residents', 'price': '€523.41  p/m', 'subsidy_possible': 'Subsidy ≥ 18yr', 'available_from': 'FROM 1 MAY 2024', 'available_until': 'PLEASE NOTE: temporary rental until 05-08-2024!', 'reactions': '57'},
{'address': 'Cornelis Lelylaan 3-H17', 'typology': 'Self-contained', 'price': '€618.22  p/m', 'subsidy_possible': 'Subsidy ≥ 18yr', 'available_from': 'FROM 1 MAY 2024', 'available_until': 'Temporary! 09-08-2024', 'reactions': '179'},
{'address': 'Cornelis Lelylaan 5-D7', 'typology': 'Self-contained', 'price': '€889.80  p/m', 'subsidy_possible': 'Subsidy ≥ 23yr', 'available_from': 'FROM 1 MAY 2024', 'available_until': 'PLEASE NOTE: TEMPORARY RENTAL UNTIL 12-08-2024!', 'reactions': '47'}]
© www.soinside.com 2019 - 2024. All rights reserved.