在无法打开 URL 并上传到 Google 云端硬盘期间处理 Selenium Python 脚本中的 BrokenPipeError

问题描述 投票:0回答:1

我正在编写一个 Python 脚本,该脚本可以浏览 URL 列表、截取网页屏幕截图,然后使用 Selenium、Google API 和 GSP 将它们上传到 Google Drive。该脚本应尝试打开每个 URL 五次;如果在五次尝试后未能打开 URL,则应该使用

continue
语句跳过当前迭代并移至下一个 URL。

但是,每当脚本在指定的尝试后无法打开 URL 时,我都会遇到

BrokenPipeError
。该脚本没有继续执行下一个 URL,而是停止执行,这不是预期的行为。以下是代码的相关部分:

max_attempts = 5

for record in records:
    url = record['Link']
    folder_id = record['Link to folder']
    successful_connection = False  # Flag to track if connection was successful

    for attempt in range(max_attempts):
        try:
            driver.get(url)
            time.sleep(random.uniform(1, 3))
            successful_connection = True  # Set the flag to True if successful
            break  # Exit the loop if successful
        except Exception as e:  # Catch the specific exception if possible
            print(f"Attempt {attempt + 1} of {max_attempts} failed: {str(e)}")
            time.sleep(10)  # Wait for 10 seconds before retrying

    if not successful_connection:
        print(f"Failed to connect to {url} after {max_attempts} attempts.")
        continue  # Skip the rest of the code in this loop iteration and move to the next record
    
    # If connection was successful, proceed with screenshot and upload
    current_date = datetime.now().strftime('%Y-%m-%d')
    page_width = driver.execute_script('return document.body.scrollWidth')
    page_height = driver.execute_script('return document.body.scrollHeight')
    screenshot_path = f"{current_date}-{record['Client']}-{record['Platform']}.png"
    driver.set_window_size(page_width, page_height)
    driver.save_screenshot(screenshot_path)

    # Upload to Google Drive
    file_metadata = {'name': screenshot_path, 'parents': [folder_id]}
    media = MediaFileUpload(screenshot_path, mimetype='image/png')
    file = drive_service.files().create(body=file_metadata, media_body=media, fields='id').execute()
    
    os.remove(screenshot_path)

driver.quit()

错误:

    self._send_request(method, url, body, headers, encode_chunked)
  File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1331, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1280, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1040, in _send_output
    self.send(msg)
  File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1001, in send
    self.sock.sendall(data)
  File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/ssl.py", line 1238, in sendall
    v = self.send(byte_view[count:])
  File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/ssl.py", line 1207, in send
    return self._sslobj.write(data)
BrokenPipeError: [Errno 32] Broken pipe
Error: Process completed with exit code 1.

我怀疑这个问题可能与异常的处理方式或资源的管理方式有关,但我不确定如何查明问题或解决 BrokenPipeError。对于可能导致此问题的原因以及如何解决此问题的任何建议或见解将不胜感激。

我尝试创建一个空的 PNG 文件并上传一个虚拟文件,以防连接不成功,但仍然遇到相同的错误。

python selenium-webdriver google-api google-drive-api gsp
1个回答
0
投票

具体异常处理: 捕获广泛的异常可能不仅仅捕获与连接相关的问题。最好捕获 driver.get() 可能引发的更具体的异常,以更适当地处理不同的错误情况。例如,您可能希望捕获超时的 TimeoutException、一般 WebDriver 问题的 WebDriverException 或其他具体情况,具体取决于您的用例。

python

from selenium.common.exceptions import TimeoutException, WebDriverException

for attempt in range(max_attempts):
    try:
        driver.get(url)
        time.sleep(random.uniform(1, 3))
        successful_connection = True
        break
    except TimeoutException as e:
        print(f"Attempt {attempt + 1} of {max_attempts} failed: Timeout - {str(e)}")
        time.sleep(10)
    except WebDriverException as e:
        print(f"Attempt {attempt + 1} of {max_attempts} failed: WebDriver issue - {str(e)}")
        time.sleep(10)
    # Add more specific exceptions as needed

记录: 考虑使用日志记录模块而不是打印语句进行日志记录。这使您可以更好地控制日志级别、格式以及将日志定向到不同的输出。

python

import logging

logging.basicConfig(level=logging.INFO)

for attempt in range(max_attempts):
    try:
        driver.get(url)
        time.sleep(random.uniform(1, 3))
        successful_connection = True
        break
    except TimeoutException as e:
        logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: Timeout - {str(e)}")
        time.sleep(10)
    except WebDriverException as e:
        logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: WebDriver issue - {str(e)}")
        time.sleep(10)
    # Add more specific exceptions as needed

处理 WebDriver 清理: 即使发生异常,也请确保处理 WebDriver 清理。您可能需要使用 try...finally 块来确保调用 driver.quit()。

python

    try:
        # Your existing code
    finally:
        driver.quit()

这些建议旨在增强脚本的稳健性和可维护性。根据您的具体用例和要求,您可能需要相应地调整异常处理和日志记录方法。

看看你对此有何看法:

python

import time
import random
from datetime import datetime
from selenium.common.exceptions import TimeoutException, WebDriverException
import logging

# Configure logging
logging.basicConfig(level=logging.INFO)

max_attempts = 5

for record in records:
    url = record['Link']
    folder_id = record['Link to folder']
    successful_connection = False  # Flag to track if connection was successful

    for attempt in range(max_attempts):
        try:
            driver.get(url)
            time.sleep(random.uniform(1, 3))
            successful_connection = True  # Set the flag to True if successful
            break  # Exit the loop if successful
        except TimeoutException as e:
            logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: Timeout - {str(e)}")
            time.sleep(10)
        except WebDriverException as e:
            logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: WebDriver issue - {str(e)}")
            time.sleep(10)
        except Exception as e:  # Catch other specific exceptions if needed
            logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: {str(e)}")
            time.sleep(10)

    if not successful_connection:
        logging.error(f"Failed to connect to {url} after {max_attempts} attempts.")
        continue  # Skip the rest of the code in this loop iteration and move to the next record

    # If connection was successful, proceed with screenshot and upload
    current_date = datetime.now().strftime('%Y-%m-%d')
    page_width = driver.execute_script('return document.body.scrollWidth')
    page_height = driver.execute_script('return document.body.scrollHeight')
    screenshot_path = f"{current_date}-{record['Client']}-{record['Platform']}.png"
    driver.set_window_size(page_width, page_height)
    driver.save_screenshot(screenshot_path)

    # Upload to Google Drive
    file_metadata = {'name': screenshot_path, 'parents': [folder_id]}
    media = MediaFileUpload(screenshot_path, mimetype='image/png')
    file = drive_service.files().create(body=file_metadata, media_body=media, fields='id').execute()

    os.remove(screenshot_path)

# Ensure proper cleanup
try:
    driver.quit()
except Exception as e:
    logging.error(f"Failed to quit the WebDriver: {str(e)}")

在此修改后的脚本中:

单独捕获 TimeoutException 和 WebDriverException 等特定异常,以便更好地处理错误。

使用日志记录代替打印语句,以实现更好的控制和灵活性。

try...finally 块可确保正确调用 driver.quit() 清理,即使执行过程中发生异常。

请确保根据您的具体要求及其运行环境进一步调整脚本。

© www.soinside.com 2019 - 2024. All rights reserved.