无法抓取包含使用 selenium 的机构的弹出窗口的网页

问题描述 投票:0回答:1

我已经废弃了该网站 https://www.whed.net/results_institutions.php 我的问题是,我可以从下拉列表中选择国家/地区名称,然后单击“确定”以获得结果。现在页面包含不同机构的不同 a 标签,我必须点击每个机构才能获取机构名称、www 和城市信息。

我为阿富汗国家编写了这段代码

service = Service("C:/Selenium_drivers/chromedriver-win64/chromedriver.exe")

driver = webdriver.Chrome(service=service)

driver.get(url)

country = 'Afghanistan'

institues = []
cities = []
wwws = []

drop_down = Select(driver.find_element(By.XPATH, '//select'))
drop_down.select_by_visible_text(country)

all_institute = driver.find_element(By.XPATH, "//input[@id='membre2']")
if not all_institute.is_selected():
    all_institute.click()
    
button = driver.find_element(By.XPATH, "//input[@type='button']")

button.click()


results_per_page = Select(driver.find_element(By.XPATH, "//select[@name='nbr_ref_pge']"))
results_per_page.select_by_visible_text('100')


total_results = int(driver.find_element(By.XPATH, "//p[@class='infos']").text.split()[0])

max_iter = total_results//100 + 1
iterations = 0

go_on = True

while go_on:
    iterations += 1
    
    institutions = driver.find_elements(By.XPATH, "//li[contains(@class, 'clearfix plus')]")
    
    
    for institue in institutions:

            link = institute.find_element(By.XPATH, ".//h3/a")
            link.click()
            
            time.sleep(2)
            
            pop_up = driver.find_element(By.XPATH, "//iframe[starts-with(@id, 'fancybox-frame')]")
            
            driver.switch_to_frame(pop_up)

    #             main_window = driver.current_window_handle  # Store the handle of the main window
    #             popup_window = None

    #             for window_handle in driver.window_handles:
    #                 if window_handle != main_window:
    #                     popup_window = window_handle

            # Switch to the popup window
    #             driver.switch_to.window(popup_window)

            institue = driver.find_element(By.XPATH, "//div[@class='detail_right']/div[1]").text

            city = driver.find_element(By.XPATH, "//span[@class='libelle' and text() = 'City:']/following-sibling::span[@class='contenu']").text

            www = driver.find_element(By.XPATH, "//span[@class='libelle' and text() = 'WWW:']/following-sibling::span[@class='contenu']").get_attribute("title")

            institues.append(institute)

            cities.append(city)

            wwws.append(www)

            close_button = wait.until(EC.element_to_be_clickable((By.XPATH, "//a[@title='Close']")))
            close_button.click()

#             driver.switch_to.window(main_window)

#             driver.switch_to.window(main_window)

    if iterations >= max_iter:
        go_on =False
        break
        
    time.sleep(2)
    
    next_page = driver.find_elements(By.XPATH, "//a[@title='Next page' ]")[0]
    next_page.click()

这会设置 selenium 驱动程序,但无法单击机构链接。请帮忙

python selenium-webdriver web-scraping extract
1个回答
0
投票

由于您尝试访问的元素位于

iframe
标签内,因此要访问
iframe
标签内的元素,您必须先切换到该标签。

driver.switch_to.frame(driver.find_element(By.CSS_SELECTOR, ".fancybox-iframe"))

之后,您现在应该能够与您仅选择的

iframe
内的元素进行交互。

要访问

iframe
之外的元素,您也必须切换出去

driver.switch_to.parent_frame()
© www.soinside.com 2019 - 2024. All rights reserved.