我在尝试向 URL“https://www.nasdaq.com/feed/rssoutbound?category=FinTech”发出 HTTP GET 请求时遇到一个特殊问题。当我在网络浏览器中手动输入此 URL 时,提要加载时没有任何问题。然而,当我尝试使用代码以编程方式发出相同的请求时(我尝试过使用
HttpClient
的 C# 和使用 Axios 的 Node.js),请求会无限期挂起并最终超时。
这是我的 C# 代码:
public async Task Execute()
{
using HttpClient httpClient = new HttpClient();
try
{
// Specify the URL you want to request
string url = "https://www.nasdaq.com/feed/rssoutbound?category=FinTech";
httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("*/*"));
httpClient.DefaultRequestHeaders.AcceptLanguage.Add(new StringWithQualityHeaderValue("en-US"));
httpClient.DefaultRequestHeaders.AcceptLanguage.Add(new StringWithQualityHeaderValue("en", 0.9));
httpClient.DefaultRequestHeaders.Add("sec-ch-ua", "\"Chromium\";v=\"116\", \"Not)A;Brand\";v=\"24\", \"Microsoft Edge\";v=\"116\"");
httpClient.DefaultRequestHeaders.Add("sec-ch-ua-mobile", "?0");
httpClient.DefaultRequestHeaders.Add("sec-ch-ua-platform", "\"Windows\"");
httpClient.DefaultRequestHeaders.Add("sec-fetch-dest", "empty");
httpClient.DefaultRequestHeaders.Add("sec-fetch-mode", "cors");
httpClient.DefaultRequestHeaders.Add("sec-fetch-site", "same-origin");
// Add the User-Agent header
httpClient.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36");
// Send a GET request and wait for the response
HttpResponseMessage response = await httpClient.GetAsync(url);
// Check if the request was successful
if (response.IsSuccessStatusCode)
{
// Read the content of the response as a string
string content = await response.Content.ReadAsStringAsync();
// Print the content to the console
Console.WriteLine(content);
}
else
{
Console.WriteLine($"HTTP request failed with status code: {response.StatusCode}");
}
}
catch (Exception e)
{
Console.WriteLine(e);
throw;
}
}
我也尝试过使用 Node.js 和 Axios 发出相同的请求,结果是相同的:
// Node.js code using Axios
const axios = require('axios');
async function fetchData() {
try {
const response = await axios.get('https://www.nasdaq.com/feed/rssoutbound?category=FinTech', {
headers: {
Accept: '*/*',
'Accept-Language': 'en-US,en;q=0.9,tr;q=0.8',
'sec-ch-ua': '"Chromium";v="116", "Not)A;Brand";v="24", "Microsoft Edge";v="116"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36'
}
});
// ... (handling the response)
} catch (error) {
console.error('An error occurred:', error.message);
}
}
fetchData();
奇怪的是,当我尝试从其他 RSS 源检索提要时,一切正常。这个问题似乎是纳斯达克提要特有的。我什至尝试过使用 Puppeteer 来获取内容,它也无限期地挂起。
我使用 Edge(浏览器)工具通过清除所有可编辑标头并且没有 cookie 来发送请求,它仍然有效。所以我怀疑是cookie的问题。
以下 PowerShell 也适用;
$session = New-Object Microsoft.PowerShell.Commands.WebRequestSession
$session.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.81"
Invoke-WebRequest -UseBasicParsing -Uri "https://www.nasdaq.com/feed/rssoutbound?category=FinTech" `
-WebSession $session `
-Headers @{
"authority"="www.nasdaq.com"
"method"="GET"
"path"="/feed/rssoutbound?category=FinTech"
"scheme"="https"
"accept"="*/*"
"accept-encoding"="gzip, deflate, br"
"accept-language"="en-US,en;q=0.9,tr;q=0.8"
"sec-ch-ua"="`"Chromium`";v=`"116`", `"Not)A;Brand`";v=`"24`", `"Microsoft Edge`";v=`"116`""
"sec-ch-ua-mobile"="?0"
"sec-ch-ua-platform"="`"Windows`""
"sec-fetch-dest"="empty"
"sec-fetch-mode"="cors"
"sec-fetch-site"="same-origin"
}
具体是什么原因导致纳斯达克提要出现此问题?他们的服务器配置或处理可能导致此行为的请求的方式是否有什么独特之处?任何见解或建议将不胜感激。
问题显然是 Gzip 处理。以下代码有效:
public async Task GetRssFeed(string url)
{
using HttpClient httpClient = new HttpClient();
try
{
HttpRequestMessage request = new HttpRequestMessage(HttpMethod.Get, url);
request.Headers.Add("authority", "www.nasdaq.com");
request.Headers.Add("accept", "*/*");
request.Headers.Add("accept-encoding", "gzip, deflate, br");
request.Headers.Add("accept-language", "en-US,en;q=0.9,tr;q=0.8");
request.Headers.Add("sec-ch-ua", "\"Chromium\";v=\"116\", \"Not)A;Brand\";v=\"24\", \"Microsoft Edge\";v=\"116\"");
request.Headers.Add("sec-ch-ua-mobile", "?0");
request.Headers.Add("sec-ch-ua-platform", "\"Windows\"");
request.Headers.Add("sec-fetch-dest", "empty");
request.Headers.Add("sec-fetch-mode", "cors");
request.Headers.Add("sec-fetch-site", "same-origin");
request.Headers.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.81");
HttpResponseMessage response = await httpClient.SendAsync(request);
response.EnsureSuccessStatusCode();
if (response.Content.Headers.ContentEncoding.Contains("gzip"))
{
await using var responseStream = await response.Content.ReadAsStreamAsync();
await using var decompressedStream = new GZipStream(responseStream, CompressionMode.Decompress);
using var streamReader = new StreamReader(decompressedStream);
string decompressedContent = await streamReader.ReadToEndAsync();
Console.WriteLine(decompressedContent);
}
}
catch (Exception e)
{
Console.WriteLine(e);
throw;
}
}