将代理设置为 selenoid webdriver

问题描述 投票:0回答:1

我使用网络驱动程序连接到硒化物容器

from selenium import webdriver
from selenium.webdriver.common.proxy import Proxy, ProxyType
proxy = 'proxy address'
link = '2ip.ru'

capabilities = {
            "browserName": 'firefox',
            "version": '71.0',
            "platform": 'LINUX'
        }
prox = Proxy()
prox.proxy_type = ProxyType.MANUAL
prox.http_proxy = proxy
prox.ssl_proxy = proxy
prox.socks_proxy = proxy
prox.add_to_capabilities(capabilities)
driver = webdriver.Remote(
            command_executor='http://localhost:4444/wd/hub',
            desired_capabilities=capabilities
        )
driver.get(link)

使用任何代理获取页面在 2ip.ru 或任何类似网站中具有相同的 IP。为什么selenoid不申请代理IP? 尝试图像selenoid/firefox:60.0、selenoid/firefox:61.0、selenoid/firefox:62.0、selenoid/firefox:70.0、selenoid/firefox:71.0、selenoid/firefox:72.0。

python-3.x selenium-webdriver web-scraping proxy selenoid
1个回答
0
投票

Chrome 和 Selenoid (MacOS) 的工作方法

1.准备硒化物

  1. 启动 Docker(我在 MacOS 上使用 docker 桌面)

  2. 启动Selenoid(我使用配置管理器 - cm_darwin_amd64 对于 macOS,文档:https://aerokube.com/selenoid/latest/#_start_selenoid)

    # to start selenoid from python code using Jupyter Notebook
    import os
    os.system("selenoid/cm_darwin_amd64 selenoid start")
    
  3. 启动Selenoid-UI(我使用端口8081,默认为8080)

    os.system("selenoid/cm_darwin_amd64 selenoid-ui start --port 8081")
    
  4. 检查容器是否工作:

    4.1。 Selenoid 位于 http://localhost:4444

    4.2。 Selenoid-UI 位于 http://localhost:8081/

2.使用 auth 准备 selenium 和代理

使用 Selenoid 在远程浏览器中运行 Selenium webdriver 类似于在计算机本地驱动程序上运行,因此我使用之前案例中的 Chrome 代理 - https://stackoverflow.com/a/73744426/5379091(有关 selenium 代理的大主题,可能有用)

现在设置代理的权限并定义函数以快速创建驱动程序

import zipfile

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

proxy_param = (<host>, <port>, <user>, <pwd>)  # set creds here

manifest_json = """
    {
        "version": "1.0.0",
        "manifest_version": 2,
        "name": "Chrome Proxy",
        "permissions": [
            "proxy",
            "tabs",
            "unlimitedStorage",
            "storage",
            "<all_urls>",
            "webRequest",
            "webRequestBlocking"
        ],
        "background": {
            "scripts": ["background.js"]
        },
        "minimum_chrome_version":"22.0.0"
    }
    """

background_js = """
var config = {
        mode: "fixed_servers",
        rules: {
        singleProxy: {
            scheme: "http",
            host: "%s",
            port: parseInt(%s)
        },
        bypassList: ["localhost"]
        }
    };
chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
function callbackFn(details) {
    return {
        authCredentials: {
            username: "%s",
            password: "%s"
        }
    };
}
chrome.webRequest.onAuthRequired.addListener(
            callbackFn,
            {urls: ["<all_urls>"]},
            ['blocking']
);
""" % proxy_param

SELENOID_URL="http://localhost:4444/wd/hub"
SELENOID_WEB_URL="http://localhost:8081"


def get_selenoid_driver():
    
    # enable VNC, choose version
    capabilities = {
        "selenoid:options": {
            "browserName": 'chrome',
            "version": '120.0',
            "enableVNC": True,
            "enableJS": True,
            "enableVideo": False,
            "screenResolution": "1920x1080x24",  # 1280x1024x24
            "sessionTimeout": "3m",
        }
    }
    
    # enable proxy
    chrome_options = webdriver.ChromeOptions()
    pluginfile = 'proxy_auth_plugin.zip'
    with zipfile.ZipFile(pluginfile, 'w') as zp:
        zp.writestr("manifest.json", manifest_json)
        zp.writestr("background.js", background_js)
    chrome_options.add_extension(pluginfile)
    
    
    driver = webdriver.Remote(
        command_executor=SELENOID_URL,
        options=chrome_options,
        desired_capabilities=capabilities,
    )
    
    # wait for the page load no matter what (if need it)
    # driver.implicitly_wait(10)  

    print(f"You can view all running drivers here: {SELENOID_WEB_URL}")
    print(f"Direct link to VNC: {SELENOID_WEB_URL}/#/sessions/{driver.session_id}")
    return driver

3.使用 Selenoid 运行 selenium 驱动程序

  1. 创建驱动程序

    driver = get_selenoid_driver()
    # You can view all running drivers here: http://localhost:8081
    # Direct link to VNC: http://localhost:8081/#/sessions/<your-session-id>
    
  2. 在 Selenoid UI 上检查 http://localhost:8081

    Chrome 中正确的 VNC 看起来像这样

  3. 检查代理是否正常工作

    url_to_open='https://ifconfig.me'
    
    driver.get(url_to_open)
    html = driver.page_source
    
    print(proxy_param[0] in html)
    # True
    

    这是代理IP。成功

对我有帮助的链接:

  1. https://okhlopkov.com/how-to-parse-any-website/(使用Selenoid的想法)
  2. https://aerokube.com/cm/latest/(配置管理器 - 用于安装 Selenoid,需要 Docker)
  3. https://aerokube.com/selenoid/1.4.3/(调整 selenoid 的文档)
© www.soinside.com 2019 - 2024. All rights reserved.