StaleElementReferenceException:我的Selenium代码不会翻页

问题描述 投票:0回答:2

我正在尝试使用Selenium和Python来抓取网站的几个页面,但我的代码却一遍又一遍。我希望能够在每页底部给出的值框中输入页码。截至目前,我的代码确实输入了页码,但在加载新页面后它就会中断。我已经能够只抓第一页,一旦第二页加载,代码就会中断。

这是我的代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys

driver = webdriver.Safari()
wait = WebDriverWait(driver, 1)
driver.get("http://www.incometaxindia.gov.in/Pages/utilities/exempted-institutions.aspx")


call_names = {"Address": "Address", "State": "State", "City": "City", "Chief Commissioner of Income Tax Cadre Controlling Authority (CCIT- CCA) / DGIT (Exemptions)":"CCIT_DGIT_Exemptions", "Chief Commissioner of Income Tax (CCIT)":"CCIT", "Commissioner of Income Tax (CIT)": "CIT","Approved under Section": "Approved_under_Section", "Date of Order (DD/MM/YYYY)": "Date_of_order", "Date of Withdrawal/Cancellation (DD/MM/YYYY)":"Date_of_withdrawal", "Date of Expiry (DD/MM/YYYY)": "Date_of_Expiry", "Remarks": "Remarks"}


while True:

    for elem in wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME,"faq-sub-content exempted-result"))):

        listofIDstoScrape = []

        name = elem.find_elements_by_class_name("fc-blue fquph")
        pancard = elem.find_elements_by_class_name("pan-id")
        details = driver.find_elements_by_class_name("exempted-detail")
        for i in details:
            pan = i.text

        wait.until(EC.presence_of_element_located((By.TAG_NAME, 'li')))

        for n, p, key in zip(name, pancard, details):
            main_list = {"Name": (n.text.replace(p.text,'')), "Pancard": p.text}

            for elem_li in key.find_elements_by_tag_name("li"):
                main_list[call_names [elem_li.find_element_by_tag_name('strong').text]] = elem_li.find_element_by_tag_name('span').text

            print (main_list)

    try:
        for k in range(2,10):
                myElem = WebDriverWait(driver, 1).until(EC.presence_of_element_located((By.ID, "ctl00_SPWebPartManager1_g_d6877ff2_42a8_4804_8802_6d49230dae8a_ctl00_txtPageNumber")))
                myElem.send_keys(str(k))
                myElem.send_keys(Keys.RETURN)


        print ("Page is ready!")
        break

    except TimeoutException:
            print ("Loading took too much time!")

这是错误:

    --------------------------------------------------------------------------

    ---------------------------------------------------------------------------
    ---------------------------------------------------------------------------
StaleElementReferenceException            Traceback (most recent call last)
<ipython-input-66-aa6debbcbeae> in <module>()
     32 
     33             for elem_li in key.find_elements_by_tag_name("li"):
---> 34                 main_list[call_names [elem_li.find_element_by_tag_name('strong').text]] = elem_li.find_element_by_tag_name('span').text
     35 
     36             print (main_list)

/anaconda/lib/python3.6/site-packages/selenium/webdriver/remote/webelement.py in find_element_by_tag_name(self, name)
    230             - name - name of html tag (eg: h1, a, span)
    231         """
--> 232         return self.find_element(by=By.TAG_NAME, value=name)
    233 
    234     def find_elements_by_tag_name(self, name):

/anaconda/lib/python3.6/site-packages/selenium/webdriver/remote/webelement.py in find_element(self, by, value)
    516 
    517         return self._execute(Command.FIND_CHILD_ELEMENT,
--> 518                              {"using": by, "value": value})['value']
    519 
    520     def find_elements(self, by=By.ID, value=None):

/anaconda/lib/python3.6/site-packages/selenium/webdriver/remote/webelement.py in _execute(self, command, params)
    499             params = {}
    500         params['id'] = self._id
--> 501         return self._parent.execute(command, params)
    502 
    503     def find_element(self, by=By.ID, value=None):

/anaconda/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py in execute(self, driver_command, params)
    309         response = self.command_executor.execute(driver_command, params)
    310         if response:
--> 311             self.error_handler.check_response(response)
    312             response['value'] = self._unwrap_value(
    313                 response.get('value', None))

/anaconda/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py in check_response(self, response)
    235         elif exception_class == UnexpectedAlertPresentException and 'alert' in value:
    236             raise exception_class(message, screen, stacktrace, value['alert'].get('text'))
--> 237         raise exception_class(message, screen, stacktrace)
    238 
    239     def _value_or_default(self, obj, key, default):

StaleElementReferenceException: Message: An element command failed because the referenced element is no longer available.

这就是输出的样子:

{'Name': 'INDIA INCLUSION FOUNDATION', 'Pancard': 'AABTI3598J', 'Address': 'No.250/1, 16th and 17th Cross, \nSampige Road, Malleshwaram,\nBangalore-560003.', 'State': 'KARNATAKA', 'City': 'BANGALORE', 'CCIT_DGIT_Exemptions': 'PR.CCIT BENGALURU', 'CCIT': 'CCIT(E) NEW DELHI', 'CIT': 'CIT(E) BENGALURU', 'Approved_under_Section': '12A', 'Date_of_order': '30/03/3017', 'Date_of_withdrawal': ' -  ', 'Date_of_Expiry': ' -  ', 'Remarks': ' - '}
python selenium web-scraping staleelementreferenceexception
2个回答
0
投票

我有StaleElementReference异常的类似问题。

这里的问题是,在提供下一页码并发送Keys.RETURN后,Selenium会找到您正在等待的元素,但这是旧页面的元素,在加载下一页后,这些元素不再连接到Dom,而是被替换通过新页面的内容,但Selenium将与前一页面的元素进行交互,这些元素不再附加到Dom,从而产生StaleElement异常。

按下Keys.RETURN之后,必须等到下一页完全加载后再重新开始循环。除了presence_of_all_elements_located((By.CLASS_NAME,“faq-sub-content exempted-result”)之外,这必须是其他内容。

对您而言,一个好的策略可能是等待您导航到的页面的页面选择器具有“NumericalPagerSelected”类。如何等待具有特定值属性的元素在此处描述:Using selenium webdriver to wait the attribute of element to change value

请参阅:StaleElementException when Clicking on a TableRow in an Angular WebPage我是如何解决它的。


0
投票

在以下情况下抛出StaleElementReferenceException:

  • 该元素已被完全删除。
  • 该元素不再附加到DOM。

在你的情况下,对于第35行中的一个find_element_by_tag_name()抛出此异常。

确保元素存在。如果存在,请尝试在找到元素之前等待元素一段时间。

© www.soinside.com 2019 - 2024. All rights reserved.