我正在尝试抓取以下网站travelocity.com。
我的代码应该执行以下操作:
但是当它单击“显示更多”按钮时,它不会更新更多结果,它只是继续单击它,但什么也没有发生。可能是什么问题?
就是页面底部的这个按钮:
我当前单击并释放按钮的代码如下:
try:
show_more_button = driver.find_element(By.XPATH, "//button[contains(text(), 'Show more')]")
action_chains = ActionChains(driver)
action_chains.click_and_hold(show_more_button).perform()
time.sleep(1)
action_chains.release().perform()
time.sleep(5)
except NoSuchElementException:
print("All results are loaded.")
break
我也尝试过:
show_more_button = driver.find_element(By.XPATH, "//button[contains(text(), 'Show more')]")
show_more_button.click()
但是效果不太好。
完整代码如下:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
URL = 'https://www.travelocity.com/Hotel-Search?destination=San%20Francisco%20%28and%20vicinity%29%2C%20California%2C%20United%20States%20of%20America®ionId=178305&latLong=37.7874%2C-122.4082&flexibility=0_DAY&d1=2024-05-08&startDate=2024-05-08&d2=2024-05-22&endDate=2024-05-22&adults=2&rooms=1&theme=&userIntent=&semdtl=&useRewards=false&sort=RECOMMENDED'
driver = webdriver.Chrome()
driver.get(URL)
driver.maximize_window()
time.sleep(4)
while True:
try:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
show_more_button = driver.find_element(By.XPATH, "//button[contains(text(), 'Show more')]")
action_chains = ActionChains(driver)
action_chains.click_and_hold(show_more_button).perform()
time.sleep(0.5)
action_chains.release().perform()
time.sleep(3)
except NoSuchElementException:
print("All results are loaded.")
break
driver.quit()
该网站检测到 Selenium,然后阻止其数据。您需要使用隐秘的 Python Selenium 框架(例如 https://github.com/seleniumbase/SeleniumBase)来绕过该限制。
pip install seleniumbase
- 然后使用 python
: 运行下面的代码
from seleniumbase import SB
with SB(uc=True) as sb:
url = "https://www.travelocity.com/Hotel-Search?destination=San%20Francisco%20%28and%20vicinity%29%2C%20California%2C%20United%20States%20of%20America®ionId=178305&latLong=37.7874%2C-122.4082&flexibility=0_DAY&d1=2024-05-08&startDate=2024-05-08&d2=2024-05-22&endDate=2024-05-22&adults=2&rooms=1&theme=&userIntent=&semdtl=&useRewards=false&sort=RECOMMENDED"
sb.driver.uc_open_with_reconnect(url, 6)
sb.driver.uc_click("section > button.uitk-button-secondary")
breakpoint()
有关 SeleniumBase 隐身模式(UC 模式)的信息,请参阅 SeleniumBase/help_docs/uc_mode,该模式用于逃避机器人检测。
另请注意,
uc_click(selector)
方法需要 CSS Selector
才能进行完全隐形点击。 ("section > button.uitk-button-secondary"
是 CSS Selector
按钮的 Show more
。)