我正在构建web scraper,我需要从需要登录的网站上抓取html。
我尝试了大多数堆栈溢出的答案我找不到答案我正在寻找。我不知道如何获得结果html。
var baseAddress = new Uri("http://testing-ground.scraping.pro/login");
var cookieContainer = new CookieContainer();
using (var handler = new HttpClientHandler() { CookieContainer = cookieContainer })
using (var client = new HttpClient(handler) { BaseAddress = baseAddress })
{
//usually i make a standard request without authentication, eg: to the home page.
//by doing this request you store some initial cookie values, that might be used in the subsequent login request and checked by the server
var homePageResult = client.GetAsync("/login");
homePageResult.Result.EnsureSuccessStatusCode();
var content = new FormUrlEncodedContent(new[]
{
//the name of the form values must be the name of <input /> tags of the login form, in this case the tag is <input type="text" name="username">
new KeyValuePair<string, string>("usr", "admin"),
new KeyValuePair<string, string>("pwd", "12345"),
});
var loginResult = client.PostAsync("/login", content).Result;
loginResult.EnsureSuccessStatusCode();
Console.WriteLine(loginResult);
我希望qazxsw poi只有成功才能成功
usr是管理员
和
pwd是12345
但不管它是什么积极的。此外,我的主要目标是废弃结果HTML,因此在这种情况下,它应该废弃没有登录表单的HTML,而是欢迎文本。
好吧,我明白了!这是工作代码
loginResult
我意识到请求URL应该是qazxsw poi而不是qazxsw poi我在fiddler中检查了标题之后我还将DefaultRequestHeaders更改为 public static async Task Login()
{
using (var client = new HttpClient())
{
client.BaseAddress = new Uri("http://testing-ground.scraping.pro/");
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/x-www-form-urlencoded"));
var username = "admin";
var password = "12345";
var formContent = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("usr", username),
new KeyValuePair<string, string>("pwd", password),
});
HttpResponseMessage responseMessage = await client.PostAsync("/login?mode=login", formContent);
var response = await responseMessage.Content.ReadAsStringAsync();
Console.WriteLine(response);
}
}
。