我正在尝试运行一个 Flask 应用程序,该应用程序在调用某个端点时运行网络抓取工具。一切都使用 Docker Compose 进行容器化。
Docker 撰写文件
services:
selenium:
image: seleniarm/standalone-chromium # using this because I'm on M1
hostname: local
volumes:
- '/dev/shm:/dev/shm'
ports:
- '4444:4444'
api:
build: api
restart: always
hostname: local
environment:
- environment=Local
depends_on:
- selenium
ports:
- '80:80'
command: ["gunicorn", "-w", "3", "-t","300", "-b", "0.0.0.0:80", "app:app"]
API 的 Dockerfile
FROM python:3.8
COPY . /api
WORKDIR /api
RUN pip install -r requirements.txt --no-cache-dir
EXPOSE 5000
EXPOSE 4444 # not sure if this is necessary?
示例 Python 函数(这是由 Flask 路由调用的,我知道它配置正确
from fake_useragent import UserAgent
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import time
def test_webdriver():
logger.info("Beginning to test webdriver")
ua = UserAgent()
user_agent = ua.random
options = webdriver.ChromeOptions()
options.add_argument('--log-level=3')
options.add_argument('--verbose')
options.add_argument('--ignore-ssl-errors=yes')
options.add_argument('--ignore-certificate-errors')
options.add_argument('--remote-debugging-port=9222')
options.add_argument('--disable-dev-shm-usage')
options.add_experimental_option("excludeSwitches", ['enable-logging'])
options.add_argument(f'user-agent={user_agent}')
driver = webdriver.Remote(
command_executor="http://localhost:4444/wd/hub",
options=options
)
driver.get('https://github.com')
time.sleep(5)
driver.quit()
return
错误:
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0xffffaff19490>: Failed to establish a new connection: [Errno 111] Connection refused
最终抛出......
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=4444): Max retries exceeded with url: /wd/hub/session (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffffaff19490>: Failed to establish a new connection: [Errno 111] Connection refused'))
我假设由于某种原因我的
api
容器无法到达我的 selenium
容器。我验证了访问 http://localhost:4444
返回了 Selenium Grid UI,因此我知道容器已启动并正在运行。知道为什么 API 找不到它吗?
这可能与您的 Docker 设置有关。您需要确保 selenium 和 api 服务可以访问同一 Docker 网络。您可以通过在 docker-compose 文件的根级别添加来做到这一点:
networks:
flaskapp:
driver: bridge
并将该网络添加到这两项服务中,例如
selenium:
image: seleniarm/standalone-chromium
hostname: local
volumes:
- '/dev/shm:/dev/shm'
ports:
- '4444:4444'
networks:
- flaskapp # This is what you need to add
对于您的 api 服务也是如此。