我正在尝试从JAVA中的HTTPS请求从URL下载文件。下载可以在Web浏览器中完美地进行。但是,当我从请求中带有标头的JAVA连接进行请求时,我遇到了禁止(403)的问题。下面是我的代码,用于下载文件。错误出现在下面的行
InputStream in = connection.getInputStream();
public void DownloadFile(String year,String month,String day) throws IOException {
try {
//String tempURL = defaultUrl + "/" + year + "/" + month + "/" + "cm" + day + month + year + "bhav.csv.zip";
String tempURL = "https://www.nseindia.com/content/historical/EQUITIES/1995/JAN/cm04JAN1995bhav.csv.zip";
URL url = new URL(tempURL);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.setRequestProperty("Host","www.nseindia.com:443");
connection.setRequestProperty("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3");
connection.setRequestProperty("Sec-Fetch-Mode","navigate");
connection.setRequestProperty("Sec-Fetch-Site","same-origin");
connection.setRequestProperty("Sec-Fetch-User","?1");
connection.setRequestProperty("Upgrade-Insecure-Requests","1");
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36");
InputStream in = connection.getInputStream();
File file = new File(destFolder + "\\" + day + "_" + month + "_" + year + ".zip");
file.createNewFile();
FileOutputStream out = new FileOutputStream(file);
copy(in, out, 1024);
connection.disconnect();
out.close();
System.out.println("Downloaded ......... " + day + "_" + month + "_" + year + ".zip");
}catch (Exception ex)
{
ex.printStackTrace();
System.out.println("Not Found ......... " + day + "_" + month + "_" + year + ".zip");
}
}
public static void copy(InputStream input, OutputStream output, int bufferSize) throws IOException {
byte[] buf = new byte[bufferSize];
int n = input.read(buf);
while (n >= 0) {
output.write(buf, 0, n);
n = input.read(buf);
}
output.flush();
}
我已经使用“实时HTTP标头”捕获了请求标头,该标头是在通过Google Chrome浏览器下载时生成的。
请求和响应头在后面
GET /content/historical/EQUITIES/1995/JAN/cm02JAN1995bhav.csv.zip HTTP/1.1
Host: www.nseindia.com:443
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: same-origin
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 4177
Content-Type: application/zip
Date: Tue, 10 Dec 2019 08:21:48 GMT
ETag: "1051-47ca323fae000"
Last-Modified: Fri, 08 Jan 2010 08:40:32 GMT
Server: Apache
X-FRAME-OPTIONS: SAMEORIGIN
我遇到了同样的问题,似乎是一种服务器保护措施,可以避免来自Java客户端的请求。但是我不明白它是如何工作的。我还再现了与我的Web浏览器完全相同的请求:标头,Cookie,参数等...