硒的问题在一段时间后无法加载页面

问题描述 投票:0回答:1

我编写了一个简单的代码,我从一个网站上抓取了食谱。每个食谱的网址都写在 Excel 上,我用 pandas 读取它。我有一个奇怪的问题,例如我想抓取 100 个食谱,当 for 转到 i = 21 时它会中断并且不会加载页面(无限加载网站),但是当我从 20 开始 for 循环时它在 41 处中断。重新运行代码并可以在 i = 17 处中断,这是相当随机的。 有人有这个类似的问题吗? 网站:https://akispetretzikis.com/en 谢谢你

def mainProgram(start):
    now = datetime.now()
    options = webdriver.ChromeOptions()
    options.add_argument("start-maximized")
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-infobars')
    options.add_argument('--disable-dev-shm-usage')
    options.add_experimental_option('useAutomationExtension', False)
    options.add_argument('--disable-blink-features=AutomationControlled')                                                                        
    theDictionary = {"Link": [], "Name": [], "Time": [], "Difficulty": [],     
                     "Merides": [], "Ingredients": [],
                     "ThermidesPer100gr": [], "ThermidesAnaMerida": []}
    driver = webdriver.Chrome(executable_path=r'/usr/lib/chromium-browser/chromedriver', 
                              options=options)
    driver.set_window_size(1280, 960)                                                
    thePath = os.path.join(os.path.expanduser("~"), "Desktop", "ScrapeRecipes",   
                           "Cooking"+str(now.year)+".xlsx")
    thePathReadExcel = os.path.join(os.path.expanduser("~"), "Desktop", 
                                    "CookingUrls"+str(now.year)+".xlsx")
    UrlOfRecipes = readExcel(thePath=thePathReadExcel)


    try:
        Length = len(UrlOfRecipes)
        print(Length)
        Length = 100#e.g. 100 actual Length over 1k
        for i in range(start, Length, 1):
            driver.delete_all_cookies()
            driver.get(UrlOfRecipes["Link"][i])
            wait = WebDriverWait(driver, 20 + round(random.uniform(0, 4), 2))
            time.sleep(30 + round(random.uniform(0, 4), 2))  # mandatory sleep
            theDictionary["Link"].append(UrlOfRecipes["Link"][i])
            theDictionary = getDataFromRecipe(driver, theDictionary)
            time.sleep(20 + round(random.uniform(0, 4), 2))
            print(i)
    except Exception as e:
        print(e)
        writeOnExcel(theDict, thePath)
python selenium
1个回答
0
投票

我也面临着同样的问题。但我仍然没有找到任何解决方案。

© www.soinside.com 2019 - 2024. All rights reserved.