登录到Scraping Testing Ground网站和Scrape html之后

问题描述 投票:0回答:1

我正在构建web scraper,我需要从需要登录的网站上抓取html。

我尝试了大多数堆栈溢出的答案我找不到答案我正在寻找。我不知道如何获得结果html。

var baseAddress = new Uri("http://testing-ground.scraping.pro/login");
            var cookieContainer = new CookieContainer();
            using (var handler = new HttpClientHandler() { CookieContainer = cookieContainer })
            using (var client = new HttpClient(handler) { BaseAddress = baseAddress })
            {
                //usually i make a standard request without authentication, eg: to the home page.
                //by doing this request you store some initial cookie values, that might be used in the subsequent login request and checked by the server
                var homePageResult = client.GetAsync("/login");
                homePageResult.Result.EnsureSuccessStatusCode();

                var content = new FormUrlEncodedContent(new[]
                {
                    //the name of the form values must be the name of <input /> tags of the login form, in this case the tag is <input type="text" name="username">
                    new KeyValuePair<string, string>("usr", "admin"),
                    new KeyValuePair<string, string>("pwd", "12345"),
                });               
                var loginResult = client.PostAsync("/login", content).Result;
                loginResult.EnsureSuccessStatusCode();
                Console.WriteLine(loginResult);

我希望qazxsw poi只有成功才能成功

usr是管理员

pwd是12345

但不管它是什么积极的。此外,我的主要目标是废弃结果HTML,因此在这种情况下,它应该废弃没有登录表单的HTML,而是欢迎文本。

c# web-scraping login
1个回答
0
投票

好吧,我明白了!这是工作代码

loginResult

我意识到请求URL应该是qazxsw poi而不是qazxsw poi我在fiddler中检查了标题之后我还将DefaultRequestHeaders更改为 public static async Task Login() { using (var client = new HttpClient()) { client.BaseAddress = new Uri("http://testing-ground.scraping.pro/"); client.DefaultRequestHeaders.Accept.Clear(); client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/x-www-form-urlencoded")); var username = "admin"; var password = "12345"; var formContent = new FormUrlEncodedContent(new[] { new KeyValuePair<string, string>("usr", username), new KeyValuePair<string, string>("pwd", password), }); HttpResponseMessage responseMessage = await client.PostAsync("/login?mode=login", formContent); var response = await responseMessage.Content.ReadAsStringAsync(); Console.WriteLine(response); } }

© www.soinside.com 2019 - 2024. All rights reserved.