WebRequest不返回HTML

问题描述 投票:-3回答:2

我想加载这个http://www.yellowpages.ae/categories-by-alphabet/h.html网址,但它返回null

在某些问题上,我听说过添加Cookie容器,但它已经存在于我的代码中。

var MainUrl = "http://www.yellowpages.ae/categories-by-alphabet/h.html";
HtmlWeb web = new HtmlWeb();
web.PreRequest += request =>
{
    request.CookieContainer = new System.Net.CookieContainer();
    return true;
};
web.CacheOnly = false;
var doc = web.Load(MainUrl);

该网站在浏览器中打开完美。

c# web-scraping html-agility-pack webrequest
2个回答
2
投票

你需要CookieCollection来获取饼干并将UseCookie设置为true中的HtmlWeb

CookieCollection cookieCollection = null;
var web = new HtmlWeb
{
    //AutoDetectEncoding = true,
    UseCookies = true,
    CacheOnly = false,
    PreRequest = request =>
    {
        if (cookieCollection != null && cookieCollection.Count > 0)
            request.CookieContainer.Add(cookieCollection);

        return true;
    },
    PostResponse = (request, response) => { cookieCollection = response.Cookies; }
};

var doc = web.Load("https://www.google.com");

0
投票

我怀疑这是一个cookie问题。看起来像gzip加密,因为当我试图获取页面时,我只得到了胡言乱语。如果是cookie问题,响应应该返回错误说明。无论如何。这是我的问题解决方案。

public static void Main(string[] args)
{
    HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    try
    {
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.yellowpages.ae/categories-by-alphabet/h.html");
        request.Method = "GET";
        request.ContentType = "text/html;charset=utf-8";
        request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;

        using (var response = (HttpWebResponse)request.GetResponse())
        {
            using (var stream = response.GetResponseStream())
            {
                doc.Load(stream, Encoding.GetEncoding("utf-8"));
            }
        }
    }
    catch (WebException ex)
    {
        Console.WriteLine(ex.Message);
    }
    Console.WriteLine(doc.DocumentNode.InnerHtml);
    Console.ReadKey();
}

它所做的就是解密/提取我们收到的gzip消息。我怎么知道你问的是GZIP?来自调试器的响应流表示ContentEncoding是gzip。

基本上只需添加:

request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;

对你的代码而言你很好。

© www.soinside.com 2019 - 2024. All rights reserved.