AWS Lambda 容器上的 Selenium 与 Firefox(Gecko 驱动程序):无法读取 marionette 端口

问题描述 投票:0回答:1

我的 Lambda 容器的 Dockerfile

  • 当我在 EC2 服务器上执行这些步骤时,脚本运行良好
  • 我知道 Lambda 有一个只读文件系统...所以环境在这方面有所不同
# This definitely works
FROM public.ecr.aws/lambda/python:3.9

#   Copy function code
COPY . ${{LAMBDA_TASK_ROOT}}
            
#   Install the function's dependencies using file requirements.txt
#   from your project folder.
            
COPY requirements.txt  .
RUN  pip3 install -r requirements.txt --target "${{LAMBDA_TASK_ROOT}}"

# Installing Firefox and Gecko Driver (problem should be here)
RUN yum -y install amazon-linux-extras
RUN yum -y install Xvfb
RUN PYTHON=python2 amazon-linux-extras install firefox -y
RUN yum -y install wget
RUN wget https://github.com/mozilla/geckodriver/releases/download/v0.32.2/geckodriver-v0.32.2-linux64.tar.gz
RUN yum -y install tar 
RUN tar -xf geckodriver-v0.32.2-linux64.tar.gz
RUN mv geckodriver /usr/local/bin/
RUN export MOZ_HEADLESS=1
RUN export HOME=/tmp/profile

# Set the CMD to your handler (could also be done as a parameter override outside of the Dockerfile)
CMD [ "{handler}" ]"""

我的脚本(设置selenium firefox驱动程序)

  • 使用 use_portal_driver 会导致错误
import os
import contextlib
import shutil
from selenium.webdriver.firefox.options import Options
from selenium import webdriver

@contextlib.contextmanager
def use_portal_driver():
    if not os.path.exists("/tmp/profile"):
        os.makedirs("/tmp/profile")

    options = Options()
    options.set_preference("pdfjs.disabled", True)
    options.set_preference("browser.download.folderList", 2)
    options.set_preference("browser.download.manager.useWindow", False)
    if not os.path.exists("/tmp/portal_downloads"):
        os.makedirs("/tmp/portal_downloads")
    options.set_preference("browser.download.dir", os.path.abspath("/tmp/portal_downloads"))
    options.set_preference("browser.helperApps.neverAsk.saveToDisk",
                           "application/pdf, application/force-download")
    options.add_argument("--headless")
    options.add_argument('--disable-gpu')
    options.add_argument("--profile /tmp/profile")

    driver = webdriver.Firefox(options=options, log_path='/tmp/firefox.log', service_log_path="/tmp/firefox_service.log")  # error appears here
    driver.implicitly_wait(20)
    yield driver
    driver.quit()

我得到的错误

\":\"Traceback (most recent call last):\\n File \\\"/var/task/microservices/rechnungspruefung/lambda_functions/rechnungen_fuer_apotheken_filiale_automatisiert_hochladen.py\\\", line 159, in handler\\n download = portal.aktuelle_monatsrechnung_herunterladen()\\n File \\\"/var/task/microservices/rechnungspruefung/automatisiertes_hochladen/noweda.py\\\", line 21, in aktuelle_monatsrechnung_herunterladen\\n with use_portal_driver() as driver:\\n File \\\"/var/lang/lib/python3.9/contextlib.py\\\", line 119, in __enter__\\n return next(self.gen)\\n File \\\"/var/task/microservices/rechnungspruefung/automatisiertes_hochladen/portal_driver.py\\\", line 50, in use_portal_driver\\n driver = webdriver.Firefox(options=options, log_path='/tmp/firefox.log', service_log_path=\\\"/tmp/firefox_service.log\\\")\\n File \\\"/var/task/selenium/webdriver/firefox/webdriver.py\\\", line 197, in __init__\\n super().__init__(command_executor=executor, options=options, keep_alive=True)\\n File \\\"/var/task/selenium/webdriver/remote/webdriver.py\\\", line 288, in __init__\\n self.start_session(capabilities, browser_profile)\\n File \\\"/var/task/selenium/webdriver/remote/webdriver.py\\\", line 381, in start_session\\n response = self.execute(Command.NEW_SESSION, parameters)\\n File \\\"/var/task/selenium/webdriver/remote/webdriver.py\\\", line 444, in execute\\n self.error_handler.check_response(response)\\n File \\\"/var/task/selenium/webdriver/remote/errorhandler.py\\\", line 249, in check_response\\n raise exception_class(message, screen, stacktrace)\\nselenium.common.exceptions.TimeoutException: Message: Failed to read marionette port\\n\",\"

我需要在 Dockerfile 或脚本中执行其他操作(设置 selenium firefox 驱动程序)吗?

  • 我将 firefox 配置文件设置为 tmp 目录中的子目录,因为我知道 Lambda 环境中的其他所有内容都是只读的
  • 我希望脚本能够正常运行并且不会抛出此错误
python selenium-webdriver firefox aws-lambda
1个回答
0
投票

我可以通过将

HOME
环境变量设置为
/tmp
来克服这个问题。将以下内容添加到您的 Dockerfile 中:

ENV HOME="/tmp"
© www.soinside.com 2019 - 2024. All rights reserved.