Python Selenium:引发 TimeoutException(消息、屏幕、堆栈跟踪) TimeoutException:消息:

问题描述 投票:0回答:1

我有一个网络抓取项目,我试图从网页上抓取一些数据。我选择了一个名为 wykop.pl 的网站,比如说波兰的 reddit。

我的想法是,selenium 打开页面,接受 cookies,关闭广告(如果弹出,则 100% 的时间都不会出现)转到页面底部(可选,我不这样做)认为这是需要的),然后使用 css 选择器单击下一页按钮。

这是我的代码

website = "https://wykop.pl/hity/roku/strona/1"

cookies_button_xpath = '''
//button[contains(@class,'qxOn2zvg e1sXLPUy')]''' #relative xpath for accepting cookies




service_chrome = Service(executable_path = chromepath) 
options_chrome = webdriver.ChromeOptions()
driver_chrome = webdriver.Chrome(service = service_chrome, options = options_chrome) # otwieramy chrome

driver_chrome.maximize_window() # mazimizes browser's window
driver_chrome.get(website) # opens a website

time.sleep(3) # sometimes there can be some delays when accessing website, one can specify waiting for couple of secs

content = driver_chrome.find_element('xpath',cookies_button_xpath) # finds the button
content.click() # clicks the button
#DZIALA
#next_page_class_next = driver_chrome.find_element_by_css_selector("li.next")

#usuniete, teraz to trzeba zrobic tak



# a css selector to target the next page button with the class "next"
next_page_button_css_selector = 'next > a'

try:
    # Wait for the close button of the ad to be visible
    close_ad_button = WebDriverWait(driver_chrome, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "button[data-v-6fdb93ea]")))
    
    #if the ad apperas
    close_ad_button.click()
except:
    # If the ad doesn't appear 
    pass


# get us to the bottom of the page
driver_chrome.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# wait for the next page button to be clickable
next_page = WebDriverWait(driver_chrome, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, next_page_css_selector))).click()

这是错误:

---------------------------------------------------------------------------
TimeoutException                          Traceback (most recent call last)
Cell In[27], line 47
     45 driver_chrome.execute_script("window.scrollTo(0, document.body.scrollHeight);")
     46 # wait for the next page button to be clickable
---> 47 next_page = WebDriverWait(driver_chrome, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, next_page_css_selector))).click()

File ~\miniconda3\envs\Piotrus\Lib\site-packages\selenium\webdriver\support\wait.py:105, in WebDriverWait.until(self, method, message)
    103     if time.monotonic() > end_time:
    104         break
--> 105 raise TimeoutException(message, screen, stacktrace)

TimeoutException: Message: 

我尝试过使用xpath解决方案,问题是一样的

我尝试将时间从 10 秒增加到 30 秒、50 秒到 70 秒。没有任何效果。

我尝试过使用 css 选择器的其他变体,例如

next_page_css_selector = "li.next > a

不起作用

我知道问题出在我这边,而且我知道我已经很接近了,因为它接受我从 Xpath 获取的 cookie。

如果您尝试复制代码并看看有什么问题,我将非常感激

python html selenium-webdriver web-scraping timeoutexception
1个回答
0
投票

要从不同页面获取链接更容易使用他们的 Ajax 分页 API,例如:

import requests

url = "https://wykop.pl/api/v3/hits/links"
params = {"limit": "20", "page": "1", "sort": "year"}
headers = {
    "Authorization": "Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VybmFtZSI6Inc1Mzk0NzI0MDc0OCIsInVzZXItaXAiOiIxMDUzMTU3MTQ5Iiwicm9sZXMiOlsiUk9MRV9BUFAiXSwiYXBwLWtleSI6Inc1Mzk0NzI0MDc0OCIsImV4cCI6MTcxNDUwMjA5MX0.X2mUIzvmz5FSskFRzuVYX37yAJU9aTlZqI56VqZCvWY"
}

for params["page"] in range(1, 3):  # <-- increase number of pages here
    data = requests.get(url, params=params, headers=headers).json()
    for d in data["data"]:
        print(
            d["votes"]["count"], d["title"], f'{d["votes"]["up"]}/{d["votes"]["down"]}'
        )
        print(d["source"]["url"])
        print()

打印:


...

5037 Kiedy ekstradycja Sebastiana M. do Polski? 5057/20
https://wykop.pl/artykul/7003275/kiedy-ekstradycja-sebastiana-m-do-polski

5040 Deweloperzy lobbują, aby usunąć wymóg ilości miejsc parkingowych na mieszkanie 5048/8
https://www.money.pl/gospodarka/zmiany-w-lex-deweloper-branza-parkingowy-wymog-musi-zniknac-7000188460038656a.html

5027 TEDE vs PiSowscy, ale to jest piękne xD 5187/160
https://www.threads.net/@lechuczechu/post/C1K9rbwv2dQ

4966 Policjant wyrywa telefon kierowcy niszcząc jego własność, wypiera się, ale wszys 4988/22
https://www.youtube.com/watch?v=Ly5J_46HY_Q

4900 Apel - administracjo zablokuj dodawanie FAME MMA 5272/372
https://wykop.pl/link/7299981/darmowe-fame-mma-reborn-na-tym-dc-https-discord-gg-a5ranypbdv-darmowe-clout-mm
© www.soinside.com 2019 - 2024. All rights reserved.