无法在.net下载网页

问题描述 投票:0回答:2

我做了一个批处理解析gearbest.com的html页面来提取项目的数据(例如链接link)。它工作到2-3周之后,该网站已更新。所以我不能下载页面进行解析,我也不会为了解释原因。在更新之前,我确实使用以下代码请求HtmlAgilityPack。

HtmlWeb web = new HtmlWeb();    
HtmlDocument doc = null;    
doc = web.Load(url); //now this the point where is throw the exception

我试过没有框架,我在请求中添加了一些日期

HttpWebRequest request = (HttpWebRequest) WebRequest.Create("https://it.gearbest.com/tv-box/pp_009940949913.html");
request.Credentials = CredentialCache.DefaultCredentials;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36";
request.ContentType = "text/html; charset=UTF-8";
request.CookieContainer = new CookieContainer();
request.Headers.Add("accept-language", "it-IT,it;q=0.9,en-US;q=0.8,en;q=0.7");
request.Headers.Add("accept-encoding", "gzip, deflate, br");
request.Headers.Add("upgrade-insecure-requests", "1");
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8";
request.CookieContainer = new CookieContainer();

Response response = request.GetResponse();  //exception

例外是:

  • IOException:无法从传输连接读取数据
  • SocketException:无法建立连接。

如果我尝试请求主页(https://it.gearbest.com)它的工作原理。

你认为有什么问题?

c# html-agility-pack webrequest
2个回答
0
投票

由于某种原因,它不喜欢提供的用户代理。如果你省略设置UserAgent一切正常

HttpWebRequest request = (HttpWebRequest) WebRequest.Create("https://it.gearbest.com/tv-box/pp_009940949913.html");
request.Credentials = CredentialCache.DefaultCredentials;
//request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36";
request.ContentType = "text/html; charset=UTF-8";

另一个解决方案是将request.Connection设置为随机字符串(但不是keep-aliveclose

request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36";
request.Connection = "random value";

它也有效,但我无法解释原因。


0
投票

也许值得尝试一下...

HttpRequest.KeepAlive = false; 
HttpRequest.ProtocolVersion = HttpVersion.Version10;

https://stackoverflow.com/a/16140621/1302730

© www.soinside.com 2019 - 2024. All rights reserved.