如何使用selenium点击Airbnb上的每间公寓

问题描述 投票:0回答:1

我正在为 Airbnb 制作一个抓取工具。刷屏人应该进入一间公寓,截图,返回,进入第二间公寓,截图,等等

抓取工具当前点击第一个公寓,截图,返回,然后再次进入第一个公寓而不是下一个公寓。我当前在变量元素下单击的 HTML 如下:

<a aria-hidden="true" tabindex="-1" class="rfexzly atm_9s_1ulexfb atm_7l_1j28jx2 atm_e2_1osqo2v dir dir-ltr" href="/rooms/9477823?category_tag=Tag%3A8678&amp;enable_m3_private_room=true&amp;photo_id=127120043&amp;check_in=2024-05-08&amp;check_out=2024-05-22&amp;source_impression_id=p3_1714848134_RZNZRm8eG%2FJ26pww&amp;previous_page_section_name=1000&amp;federated_search_id=efebfc50-8682-44ac-acee-e5d374cb3da4" rel="noopener noreferrer nofollow" target="listing_9477823"><div class="cjv59qb atm_mk_h2mmj6 atm_vy_1osqo2v atm_e2_1osqo2v dir dir-ltr"><div class="d1l1iq7v atm_9s_1o8liyq atm_vh_yfq0k3 atm_e2_88yjaz atm_vy_1r2rij0 atm_j6_t94yts bmwtyu7 atm_2m_1qred53 atm_2s_mgnkw2 dir dir-ltr" role="presentation" aria-hidden="true" style="--dls-liteimage-height: 100%; --dls-liteimage-width: 100%; --dls-liteimage-background-image: url('data:image/png;base64,null'); --dls-liteimage-background-size: cover;"><picture class=" dir dir-ltr"><source srcset="https://a0.muscache.com/im/pictures/73075622-b70f-426e-bf7e-f1d54885e049.jpg?im_w=720 1x" media="(min-width: 0px)"><img class="itu7ddv atm_e2_idpfg4 atm_vy_idpfg4 atm_mk_stnw88 atm_e2_1osqo2v__1lzdix4 atm_vy_1osqo2v__1lzdix4 i1cqnm0r atm_jp_pyzg9w atm_jr_nyqth1 i1de1kle atm_vh_yfq0k3 dir dir-ltr" aria-hidden="true" elementtiming="LCP-target" fetchpriority="high" loading="eager" src="https://a0.muscache.com/im/pictures/73075622-b70f-426e-bf7e-f1d54885e049.jpg?im_w=720" data-original-uri="https://a0.muscache.com/im/pictures/73075622-b70f-426e-bf7e-f1d54885e049.jpg?im_w=720" style="--dls-liteimage-object-fit: cover;"></picture><div class="rsb5yse atm_9s_1o8liyq atm_vh_yfq0k3 atm_e2_1osqo2v atm_vy_1osqo2v atm_9s_glywfm__1lzdix4 bmwtyu7 atm_2m_1qred53 atm_2s_mgnkw2 dqqltwe atm_2g_1isa5lx atm_2w_k6d6ah dir dir-ltr" style="--dls-liteimage-background-size: cover; --dls-liteimage-background-image: url(https://a0.muscache.com/im/pictures/73075622-b70f-426e-bf7e-f1d54885e049.jpg?im_w=720);"></div></div></div></a>

需要修复的部分就是这个部分。单击 HTML 元素时我应该关注 HTML 元素的哪一部分,以便它单击所有元素?

#...rest of the code

elements = driver.find_elements(By.CSS_SELECTOR, "a[aria-hidden='true'][tabindex='-1'][class*='dir'][href*='/rooms/']")

for element in elements:
    element.click()

#...rest of the code

完整代码如下:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
import time

URL = "https://www.airbnb.com/s/San-Francisco--California--United-States/homes?tab_id=home_tab&refinement_paths%5B%5D=%2Fhomes&flexible_trip_lengths%5B%5D=one_week&monthly_start_date=2024-06-01&monthly_length=3&monthly_end_date=2024-09-01&price_filter_input_type=0&channel=EXPLORE&query=San%20Francisco%2C%20California%2C%20United%20States&place_id=ChIJIQBpAG2ahYAR_6128GcTUEo&date_picker_type=calendar&checkin=2024-05-08&checkout=2024-05-22&source=structured_search_input_header&search_type=autocomplete_click"

driver = webdriver.Chrome()
driver.get(URL)
driver.maximize_window()

time.sleep(4)

try:
    accept_cookies = driver.find_element(By.XPATH, '//*[@id="react-application"]/div/div/div[1]/div/div[6]/section/div[2]/div[2]/button')
    accept_cookies.click()
except NoSuchElementException:
    print("No 'Accept cookies' found.")

elements = driver.find_elements(By.CSS_SELECTOR, "a[aria-hidden='true'][tabindex='-1'][class*='dir'][href*='/rooms/']")

for element in elements:
    element.click()

    # Switch to the newly opened tab
    driver.switch_to.window(driver.window_handles[-1])

    time.sleep(4)

    page_source = driver.page_source
    soup = BeautifulSoup(page_source, "html.parser")

    try:
        close_pop_up = driver.find_element(By.XPATH,
                                           '/html/body/div[9]/div/div/section/div/div/div[2]/div/div[1]/button')
        close_pop_up.click()
    except NoSuchElementException:
        print("No pop up element found.")

    apartment_name = soup.find('h1', class_='hpipapi').getText()

    time.sleep(1)

    driver.execute_script(f"window.scrollTo(0, 100);")

    driver.save_screenshot(f'screenshots/{apartment_name.replace(" ", "_")}.png')

    # Close the current tab
    driver.close()

    # Switch back to the main tab
    driver.switch_to.window(driver.window_handles[0])

driver.quit()
python selenium-webdriver
1个回答
0
投票

您的代码单击了第一个公寓链接,但由于每次单击都会打开一个新选项卡,因此不断重新关注初始选项卡。这导致只能重复访问第一套公寓。

这是解决问题的方法:

  • 代码不应直接单击,而是应迭代公寓元素并收集每个公寓的唯一链接。
  • 在新选项卡中打开列表中的每个链接。
  • 先截取屏幕截图并关闭选项卡,然后再转到下一个公寓链接。

修复后的完整代码如下:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
import time

URL = "https://www.airbnb.com/s/San-Francisco--California--United-States/homes?tab_id=home_tab&refinement_paths%5B%5D=%2Fhomes&flexible_trip_lengths%5B%5D=one_week&monthly_start_date=2024-06-01&monthly_length=3&monthly_end_date=2024-09-01&price_filter_input_type=0&channel=EXPLORE&query=San%20Francisco%2C%20California%2C%20United%20States&place_id=ChIJIQBpAG2ahYAR_6128GcTUEo&date_picker_type=calendar&checkin=2024-05-08&checkout=2024-05-22&source=structured_search_input_header&search_type=autocomplete_click"

driver = webdriver.Chrome()
driver.maximize_window()
driver.get(URL)

time.sleep(4)

try:
    accept_cookies = driver.find_element(By.XPATH, '//*[@id="react-application"]/div/div/div[1]/div/div[6]/section/div[2]/div[2]/button')
    accept_cookies.click()
except NoSuchElementException:
    print("No 'Accept cookies' found.")

elements = driver.find_elements(By.CSS_SELECTOR, "a[aria-hidden='true'][tabindex='-1'][class*='dir'][href*='/rooms/']")


# Fix Starts here
print("Collecting apartment links...")
apartment_links = []
for element in elements:
  link = element.get_attribute("href")
  if link not in apartment_links:
      apartment_links.append(link)

print(apartment_links)

for link in apartment_links:
    driver.execute_script("window.open('');")
    driver.switch_to.window(driver.window_handles[-1])
    driver.get(link)  # Fix Ends here
    time.sleep(4)

    page_source = driver.page_source
    soup = BeautifulSoup(page_source, "html.parser")

    try:
        close_pop_up = driver.find_element(By.XPATH,
                                           '/html/body/div[9]/div/div/section/div/div/div[2]/div/div[1]/button')
        close_pop_up.click()
    except NoSuchElementException:
        print("No pop up element found.")

    apartment_name = soup.find('h1', class_='hpipapi').getText()

    time.sleep(1)

    driver.execute_script(f"window.scrollTo(0, 100);")

    driver.save_screenshot(f'screenshots/{apartment_name.replace(" ", "_")}.png')

    # Close the current tab
    driver.close()

    # Switch back to the main tab
    driver.switch_to.window(driver.window_handles[0])

driver.quit()
© www.soinside.com 2019 - 2024. All rights reserved.