我想做的是在千兆连接上收集100万个url,速度在5MB / s和12MB / s(兆字节/秒)之间变化,这远低于最大带宽。我使用的代码:
DnsResolver dnsResolver = new SystemDefaultDnsResolver();
X509HostnameVerifier hostnameVerifier = new AllowAllHostnameVerifier();
SSLContext sslcontext = SSLContexts.createSystemDefault();
RedirectStrategy redirectStrategy = new LaxRedirectStrategy();
HttpConnectionFactory<HttpRoute, ManagedHttpClientConnection> connFactory= = new ManagedHttpClientConnectionFactory(
new DefaultHttpRequestWriterFactory(),
new DefaultHttpResponseParserFactory());
Registry<ConnectionSocketFactory> socketFactoryRegistry = RegistryBuilder
.<ConnectionSocketFactory> create()
.register(
"https",
new SSLConnectionSocketFactory(sslcontext,
hostnameVerifier))
.register("http", new PlainConnectionSocketFactory())
.build();
SocketConfig socketConfig = SocketConfig.custom().setSoKeepAlive(false)
.setSoReuseAddress(false)
.setSoTimeout(15000).build();
PoolingHttpClientConnectionManager manager = new PoolingHttpClientConnectionManager(socketFactoryRegistry,connFactory, dnsResolver);
manager.setDefaultSocketConfig(socketConfig);
manager.setMaxTotal(1000);
CloseableHttpClient httpClient = HttpClientBuilder.create().setUserAgent("Mozilla")
.setConnectionManager(manager)
.setRedirectStrategy(redirectStrategy)
.setMaxConnPerRoute(-1).build();
RequestConfig defaultConfig = RequestConfig.custom()
.setCookieSpec(CookieSpecs.IGNORE_COOKIES)
.setExpectContinueEnabled(false)
.setStaleConnectionCheckEnabled(false)
.setRedirectsEnabled(true)
.setStaleConnectionCheckEnabled(false)
.setMaxRedirects(5).build();
RequestConfig rConfig= RequestConfig.copy(defaultConfig)
.setSocketTimeout(15000)
.setConnectionRequestTimeout(-1)
.setConnectTimeout(15000).build();
ExecutorService executorService = Executors.newFixedThreadPool(640);
FutureRequestExecutionService service = new FutureRequestExecutionService(httpClient, executorService);
每个请求的配置是:
HttpGet httpget = new HttpGet("some url");
httpget.setConfig(rConfig);
httpget.setHeader("Connection", "close");
在ResponseHandler中,我使用以下代码来消费内容:
stream = response.getEntity().getContent();
final byte[] content = IOUtils.toByteArray(stream);
每个网址来自不同的域。该机器具有8核和8GB RAM-64位linux-Debian。如何加快速度?
如果不需要自动身份验证,重试,cookie管理,并且不介意手动处理重定向,请考虑使用最少的HttpClient实现。最小的HC使用最少的执行管道构建,该管道仅由强制性协议拦截器组成,并且应具有具有相同并发参数(连接池设置)的最佳性能特征。
PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
CloseableHttpClient hc = HttpClients.createMinimal(cm);
并且自然地,您应该希望重新使用连接以获得最佳性能。这似乎与我认为的最佳做法背道而驰。
httpget.setHeader("Connection", "close"); // Huh?