我正在尝试从房地产网站上的列表中抓取数据,并将值(例如价格、地址等)存储在一个对象中(每个列表一个)。为此,我首先找到所有父 div 元素,将它们存储在列表中,然后在迭代时查找每个单独的子元素。
我不确定为什么,但对于最后两个元素,所有对象的值都是相同的(即使它们在网站上不同),这可能意味着 Selenium 为所有对象定位相同的元素,即使我是尝试在迭代中仅定位当前父元素的子元素。
我尝试搜索在 chrome 开发工具中使用的 XPATH,发现它找到了我尝试单独定位的所有元素,这意味着这(很可能)不可能是由于存在问题而导致的使用我的 XPATH 语法。
没有引发 staleElement 或其他异常。
WebDriverWait(driver, 10, ignored_exceptions=ignored_exceptions).until(EC.presence_of_all_elements_located((By.XPATH, "//div[@class='list-item-content']")))
offer_listings = driver.find_elements(By.XPATH, "//div[@class='list-item-content']")
for offer in offer_listings:
address = offer.find_element(By.XPATH, ".//span[@class='address-part ng-binding']").text
typology = offer.find_element(By.XPATH, ".//span[@translate='aantalMedebewonersLabel' or @ng- if='::object.isZelfstandig']").text
price = offer.find_element(By.XPATH, ".//span[@class='prijs ng-binding ng-scope']").text
subsidy_possible = offer.find_element(By.XPATH, "./span[@ng-bind-html='::object.huurtoeslagVoorwaarde.localizedIconText']").text
available_from = offer.find_element(By.XPATH, ".//span[@ng-if='::object.availableFromDate']").text
available_until = offer.find_element(By.XPATH, ".//div[@class='header-informatie ng-binding ng-scope']").text
reactions = offer.find_element(By.XPATH, ".//span[@ng-if='object.numberOfReactions !== null']").text
我尝试通过为我无法定位的元素添加多个属性和值来使我的 XPATH 更加具体。
我原以为 Selenium 只会搜索父 div 中的 XPATH,但显然这不是它所做的,因为它不会具有与其他列表中的元素相同的值。
我刚刚对现有代码进行了一些调整,并能够按预期提取所有细节。使用列表和哈希来存储提取的值。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
import time
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
url = "https://www.room.nl/en/offerings/to-rent#?gesorteerd-op=prijs%2B&toekenning=3&locatie=Regio%2BAmsterdam"
driver = webdriver.Chrome(options=chrome_options)
driver.get(url)
WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//div[@class='list-item-content']")))
offer_listings = driver.find_elements(By.XPATH, "//div[@class='list-item-content']")
time.sleep(5)
offers_list = []
for offer in offer_listings:
info_hash = {}
info_hash["address"] = offer.find_element(By.XPATH, ".//span[@class='address-part ng-binding']").text
info_hash["typology"] = offer.find_element(By.XPATH, ".//span[@translate='aantalMedebewonersLabel' or @ng-if='::object.isZelfstandig']").text
info_hash["price"] = offer.find_element(By.XPATH, ".//span[@class='prijs ng-binding ng-scope']").text
info_hash["subsidy_possible"] = offer.find_element(By.XPATH, ".//span[@ng-bind-html='::object.huurtoeslagVoorwaarde.localizedIconText']").text
info_hash["available_from"] = offer.find_element(By.XPATH, ".//span[@ng-if='::object.availableFromDate']").text
info_hash["available_until"] = offer.find_element(By.XPATH, ".//div[@class='header-informatie ng-binding ng-scope']").text
info_hash["reactions"] = offer.find_element(By.XPATH, ".//span[@ng-if='object.numberOfReactions !== null']").text
offers_list.append(info_hash)
print(offers_list)
driver.quit()
打印:
[{'address': 'Uilenstede 216-1508', 'typology': '14 residents', 'price': '€523.41 p/m', 'subsidy_possible': 'Subsidy ≥ 18yr', 'available_from': 'FROM 1 MAY 2024', 'available_until': 'PLEASE NOTE: temporary rental until 05-08-2024!', 'reactions': '57'},
{'address': 'Cornelis Lelylaan 3-H17', 'typology': 'Self-contained', 'price': '€618.22 p/m', 'subsidy_possible': 'Subsidy ≥ 18yr', 'available_from': 'FROM 1 MAY 2024', 'available_until': 'Temporary! 09-08-2024', 'reactions': '179'},
{'address': 'Cornelis Lelylaan 5-D7', 'typology': 'Self-contained', 'price': '€889.80 p/m', 'subsidy_possible': 'Subsidy ≥ 23yr', 'available_from': 'FROM 1 MAY 2024', 'available_until': 'PLEASE NOTE: TEMPORARY RENTAL UNTIL 12-08-2024!', 'reactions': '47'}]