Linux/Windows 中带有 RSelenium 的 Tor 浏览器

问题描述 投票:0回答:3

希望通过我的 Linux 机器使用 RSelenium 和 Tor 返回 Tor IP(使用 Firefox 作为 Tor 浏览器)。这对于 Python 来说是可行的,但在 R 中却遇到了麻烦。任何人都可以让它工作吗?也许您可以在 Windows / Linux 中分享您的解决方案。

# library(devtools)
# devtools::install_github("ropensci/RSelenium")
library(RSelenium)

RSelenium::checkForServer()
RSelenium::startServer() 

binaryExtension <- paste0(Sys.getenv('HOME'),"/Desktop/tor-browser_en-US/Browser/firefox")
remDr <- remoteDriver(dir = binaryExtention)

remDr$open()
remDr$navigate("http://myexternalip.com/raw")
remDr$quit()

正在返回错误

Error in callSuper(...) : object 'binaryExtention' not found

供社区参考,此 Selenium 代码可在使用 Python3 的 Windows 中运行:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary

from os.path import expanduser # Finds user's user name on Windows

# Substring inserted to overcome r requirement in FirefoxBinary 
binary = FirefoxBinary(r"%s\\Desktop\\Tor Browser\\Browser\\firefox.exe"  % (expanduser("~")))
profile = FirefoxProfile(r"%s\\Desktop\\Tor Browser\\Browser\\TorBrowser\\Data\\Browser\\profile.default" % (expanduser("~")))

driver = webdriver.Firefox(profile, binary)
driver.get('http://myexternalip.com/raw')   
html = driver.page_source
soup = BeautifulSoup(html, "lxml") # lxml needed

# driver.close()

# line.strip('\n')
"Current Tor IP: " + soup.text.strip('\n')

# Based in part on
# http://stackoverflow.com/questions/13960326/how-can-i-parse-a-website-using-selenium-and-beautifulsoup-in-python
# http://stackoverflow.com/questions/34316878/python-selenium-binding-with-tor-browser
# http://stackoverflow.com/questions/3367288/insert-variable-values-into-a-string-in-python
r selenium-webdriver tor rselenium
3个回答
5
投票

类似以下内容应该有效:

browserP <- paste0(Sys.getenv('HOME'),"/Desktop/tor-browser_en-US/Browser/firefox")
jArg <- paste0("-Dwebdriver.firefox.bin='", browserP, "'")
selServ <- RSelenium::startServer(javaargs = jArg)

更新:

这在 Windows 上对我有用。首先运行测试版:

checkForServer(update = TRUE, beta = TRUE, rename = FALSE)

接下来手动打开一个版本的tor浏览器。

library(RSelenium)
browserP <- "C:/Users/john/Desktop/Tor Browser/Browser/firefox.exe"
jArg <- paste0("-Dwebdriver.firefox.bin=\"", browserP, "\"")
pLoc <- "C:/Users/john/Desktop/Tor Browser/Browser/TorBrowser/Data/Browser/profile.meek-http-helper/"
jArg <- c(jArg, paste0("-Dwebdriver.firefox.profile=\"", pLoc, "\""))
selServ <- RSelenium::startServer(javaargs = jArg)

remDr <- remoteDriver(extraCapabilities = list(marionette = TRUE))
remDr$open()
remDr$navigate("https://check.torproject.org/")

> remDr$getTitle()
[[1]]
[1] "Congratulations. This browser is configured to use Tor."

3
投票

这适用于 MacOS Sierra。

首先您需要配置 Firefox 和 Tor 浏览器手动代理。

转到您的首选项>高级>网络>设置

设置 SOCKS 主机:127.0.0.1 端口:9150 在浏览器菜单栏中选中 -> SOCKS v5。

在 Rstudio 中运行 R 脚本时,您还需要打开 Tor 浏览器...否则您将在 Firefox 浏览器中收到一条消息“代理服务器拒绝连接”

您还需要在脚本中复制您的 Firefox 配置文件的名称profile-name

打开 Finder 并转到 /Users/用户名/Library/Application Support/Firefox/Profiles/profile-name

我的 R 测试脚本

 require(RSelenium)

    fprof <- getFirefoxProfile("/Users/**username**/Library/Application\ Support/Firefox/Profiles/nfqudbv2.default-1484451212373",useBase=TRUE)

    remDrv <- remoteDriver( browserName = "firefox"
                            , extraCapabilities = fprof)

    remDrv$open()
    remDrv$navigate("https://check.torproject.org/")

这将打开 Firefox 浏览器的实例,并显示消息 “恭喜。此浏览器已配置为使用 Tor。”


1
投票

警告:我没有进行广泛的测试,但它似乎有效。

依赖@Ashley72的一些想法,但避免手动设置和复制(以及@jdarrison的解决方案所需的Rselenium中现已失效的功能)以及来自https://indranilgayen.wordpress.com/2016/10/的一些想法24/make-rselenium-work-with-r/ 调整以下配置文件选项(我通常会调整许多其他选项,但它们似乎与问题无关):

fprof <- makeFirefoxProfile(list(network.proxy.socks = "127.0.0.1", # for proxy settings specify the proxy host IP  
network.proxy.socks_port = 9150L, # proxy port. Last character "L" for specifying integer is very important and if not specified it will not have any impact
network.proxy.type = 1L, # 1 for manual and 2 for automatic configuration script. here also "L" is important    
network.proxy.socks_version=5L, #ditto     
network.proxy.socks_remote_dns=TRUE))

然后像往常一样启动服务器:

rD <- rsDriver(port = 4445L, browser = "firefox", version = "latest", geckover = "latest", iedrver = NULL, phantomver = "2.1.1",
               verbose = TRUE, check = TRUE, extraCapabilities = fprof) # works for selenium server: 3.3.1 and geckover: 0.15.0; Firefox: 52
remDr <- rD[["client"]]
remDr <- rD$client
remDr$navigate("https://check.torproject.org/") # should confirm tor is setup
remDr$navigate("http://whatismyip.org/") # should confirm tor is setup

如你所见,我没有对木偶选项进行更改。我不知道这可能会产生什么影响。请评论。

编辑:Tor 浏览器似乎必须启动并运行。否则,Rselenium 打开的浏览器会给出错误“代理服务器拒绝连接。”

© www.soinside.com 2019 - 2024. All rights reserved.