我需要使用 Python 自动下载此网页中的 .csv 文件:
https://pace.coe.int/en/aplist/committees/9/commission-des-questions-politiques-et-de-la-democratie
现在,我已经编写了这段代码:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
import time
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
chromedriver_path = r"./driver/chromedriver"
browser = webdriver.Chrome(executable_path=chromedriver_path)
url = "https://pace.coe.int/en/aplist/committees/9/commission-des-questions-politiques-et-de-la-democratie"
topics_xpath = '//*[@id="challenge-stage"]/div/label/span[2]'
browser.get(url)
time.sleep(5) #Wait a little for page to load.
escolhe = browser.find_element("xpath", topics_xpath)
time.sleep(5)
escolhe.click()
time.sleep(5)
网页打开,然后提示我单击“验证您是人类”:
我已经“检查”了按钮并复制了 xpath(参见上面的代码)。 但我收到这个错误:
NoSuchElementException: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="challenge-stage"]/div/label/span[2]"}
(Session info: chrome=114.0.5735.198)
有人可以帮助我吗?
与文本关联的复选框验证您是人类元素位于
<iframe>
内,因此您必须:
诱导 WebDriverWait 使所需的 框架可用并切换到它。
诱导 WebDriverWait 使所需的元素可点击。
您可以使用以下任一定位器策略:
使用CSS_SELECTOR:
driver.get("https://pace.coe.int/en/aplist/committees/9/commission-des-questions-politiques-et-de-la-democratie")
time.sleep(5)
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[title='Widget containing a Cloudflare security challenge']")))
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "label.ctp-checkbox-label"))).click()
使用XPATH:
driver.get("https://pace.coe.int/en/aplist/committees/9/commission-des-questions-politiques-et-de-la-democratie")
time.sleep(5)
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@title='Widget containing a Cloudflare security challenge']")))
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//label[@class='ctp-checkbox-label']"))).click()
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
您可以在以下位置找到一些相关讨论:
这是一个完整的 SeleniumBase 脚本,用于绕过该站点上的 Cloudflare。
pip install seleniumbase
,然后使用 python
运行:
from seleniumbase import SB
def verify_success(sb):
sb.assert_element('img[alt="Logo Assembly"]', timeout=8)
sb.sleep(4)
with SB(uc_cdp=True, guest_mode=True) as sb:
sb.open("https://pace.coe.int/en/aplist/committees/9/commission-des-questions-politiques-et-de-la-democratie")
try:
verify_success(sb)
except Exception:
if sb.is_element_visible('input[value*="Verify"]'):
sb.click('input[value*="Verify"]')
elif sb.is_element_visible('iframe[title*="challenge"]'):
sb.switch_to_frame('iframe[title*="challenge"]')
sb.click("span.mark")
else:
raise Exception("Detected!")
try:
verify_success(sb)
except Exception:
raise Exception("Detected!")
它只会在必要时单击复选框。使用
sb.driver
访问原始 driver
。该脚本试图完全避免检测。
@Michael Mintz Michael Mintz - 谢谢你的解决方案很棒!知道为什么我必须先运行你的脚本才能将其应用到不同的网址吗?否则它表明 uc_cdp=True 不是预期的参数