为什么Apache的HTTP客户端比Java中的URL.openConnection慢2倍?

问题描述 投票:2回答:1

考虑此代码:

package com.zip;

import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpHead;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.RandomAccessFile;
import java.util.Date;

import static com.diffplug.common.base.Errors.rethrow;

/**
 * @author nsheremet
 */
public class ParallelDownload2 {
  public static int THREADCOUNT = 20;
  private static final String URL = "https://server.com/myfile.zip";
  public static String OUTPUT = "C:\\!deleteme\\myfile.zip";
  public static void main(String[] args) throws Exception {
    System.setProperty("https.protocols", "TLSv1,TLSv1.1,TLSv1.2");
    System.out.println(new Date());

    CloseableHttpClient httpClient = HttpClients.createDefault();

    HttpGet request = new HttpGet(URL);
    request.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36");
    CloseableHttpResponse response = rethrow().wrap(() -> httpClient.execute(request)).get();
    Long contentLength = Long.parseLong(response.getFirstHeader("Content-Length").getValue());
    long blocksize = contentLength / THREADCOUNT;

    RandomAccessFile randomAccessFile = new RandomAccessFile(new File(OUTPUT), "rwd");
    randomAccessFile.setLength(contentLength);
    randomAccessFile.close();
    response.close();

    for (long i = 0; i <THREADCOUNT; i++) {
      long startpos = i * blocksize;
      long endpos = (i + 1) * blocksize - 1;
      if (i == THREADCOUNT - 1) {
        endpos = contentLength;
      }
      new Thread(new DownloadTask(i, startpos, endpos)).start();
    }
    System.out.println(new Date());
  }

  public static class DownloadTask implements Runnable {

    public DownloadTask(
        long id,
        long startpos,
        long endpos
    ) {
      this.id = id;
      this.startpos = startpos;
      this.endpos = endpos;
    }

    long id;
    long startpos;
    long endpos;

    @Override
    public void run() {
      try {
        CloseableHttpClient httpClient = HttpClients.createDefault();

        HttpGet request = new HttpGet(URL);
        request.addHeader("Range", "bytes=" + startpos + "-" + endpos + "");
        request.addHeader("Connection", "keep-alive");
        request.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36");
        CloseableHttpResponse response = rethrow().wrap(() -> httpClient.execute(request)).get();

        if (response.getStatusLine().getStatusCode() == 206) {

          InputStream is = response.getEntity().getContent();
          RandomAccessFile randomAccessFile = new RandomAccessFile(new File(OUTPUT), "rwd");
          randomAccessFile.seek(startpos);
          int len = 0;
          byte[] buffer = new byte[1024*10];
          while ((len = is.read(buffer)) != -1) {
            randomAccessFile.write(buffer, 0, len);
          }
          is.close();
          randomAccessFile.close();
          System.out.println("Thread "+ Thread.currentThread().getId() +": Download");
        }
      } catch (IOException e) {
        e.printStackTrace();
      }
      System.out.println(new Date());
    }

  }

}

这是从this one复制的副本,是通过简单的URL.openConnection编写的。为什么具有多线程的URL.openConnection会以10 Mb /秒的速度下载文件,而apach http客户端版本的速度通常在1-5 Mb /秒之间?我是否错过了http apap客户端设置中的某些内容?

更新

  1. 我使用多个HttpClient,因为单个对象的性能与通过URL的1个连接相同
  2. Http apache客户端用于鬃毛高性能服务器,因此我相信肯定存在配置问题。但是到底是什么?

关于代码

这当然不是生产就绪的代码,应该被认为是我希望快速执行多线程下载的原型。

关于nultithreading

我无法解释原因,因为我不是下载资源的所有者,而是多线程下载速度比单线程快很多(十倍)。

java apache-httpclient-4.x java-io
1个回答
0
投票

几乎没有可能性。

(1)的区别可能不是传输速度,而是连接时的初始延迟。过去我也遇到过类似的问题,罪魁祸首是IPv6。初始请求是在IPv6上完成的,它只是在超时后才静默回落到IPv4。

尝试使用-Djava.net.preferIPv4Stack=true运行,或将主机指定为数字IPv4 Quad,看看是否有所不同。

(2)差异可能是由于https实现可能会阻塞证书路径,在线吊销列表等。您可以对某些http服务器进行基准测试以确定这是否是原因。如果是这样,请查看Apache的文档,如何根据自己的喜好配置https行为。

((3)在任何情况下,运行tcpdump或Wireshark都可能为您提供更多有用的信息。

© www.soinside.com 2019 - 2024. All rights reserved.