如何使用Scrapy-sharp解决“ Slider captcha”

问题描述 投票:0回答:1

我正在尝试使用Scrapy-sharp抓取淘宝网站。我能够登录(填写用户名和密码),但是随后它转到存在“ slider captcha”的页面,如下图所示。 enter image description here

我知道我们可以使用puppeteer库解决此问题。但是我还面临着其他伪造者面临的挑战(请参阅https://stackoverflow.com/posts/comments/103786166?noredirect=1

我的问题是:有没有办法使用Scrapy-sharp和HtmlAgility包来解决滑块验证码?我们如何获得滑块的边界框?然后在Scrapy-sharp中触发鼠标事件?

我的抓取代码如下:

 ScrapingBrowser Browser = new ScrapingBrowser();
        Browser.AllowAutoRedirect = true;
        // Browser has settings you can access in setup   
        Browser.AllowMetaRedirect = true;
        WebPage PageResult = Browser.NavigateToPage(new Uri("https://login.m.taobao.com/login_oversea.htm?loginFrom=wap_tmall&assets_js=mui%2Ffeloader%2F4.0.22%2Ffeloader-min.js,mui%2Ftmapp-standalone%2F4.0.3%2Fseed.js,mui%2Ftmapp-standalone%2F4.0.3%2Flogin-download.js&assets_css=3.0.8%2Fmobile%2Ftmallh5.css&redirectURL=https%3A%2F%2Fwww.tmall.com%2F"));
        PageWebForm form = PageResult.FindFormById("loginForm");
        form["TPL_username"] = "<<someusername>>";
        form["TPL_password"] = "********";
        form.Method = HttpVerb.Post;
        WebPage resultsPage = form.Submit();
        PageWebForm searchForm = resultsPage.FindForm("searchTop");
        searchForm.Method = HttpVerb.Post;
        searchForm["q"] = "nike";
        //subsequent pages
        //var postResults = searchForm.Submit(new Uri(@"https://list.tmall.com/m/search_items.htm?page_size=20&page_no=3&q=Nike&type=p&tmhkh5=&spm=a220m.6910245.a2227oh.d100&from=mallfp..m_1_searchbutton&searchType=&closedKey="));
        //1st page//
         var postResults= searchForm.Submit(new Uri(@"https://list.tmall.com/search_product.htm?q=nike&type=p&tmhkh5=&spm=a220m.8599659.a2227oh.d100&from=mallfp..m_1_searchbutton&searchType=default&closedKey="));
        //PageWebForm verForm = resultsPage.FindFormById("verifyForm");
        //verForm.Method = HttpVerb.Post;
        //verForm.Action = "https://passport.taobao.com/iv/h5/h_5_verify_modes.htm";
        //WebPage postResults = verForm.Submit();

        //var divs = JsonConvert.DeserializeObject<RootObject>(postResults.Content);
        var divs = postResults.Html.SelectNodes("//div[@class='product  ']")

但是提交表单后,它将使用滑块验证码重定向到页面。有关如何解决此问题的任何提示/建议?

web-scraping recaptcha scrapysharp
1个回答
0
投票

在手动提交此表单之前和之后,值得研究浏览器进行的API调用。我们也许可以尝试对这些调用进行编程。你能通过吗?我正在寻找解决与您相似的问题的方法。

© www.soinside.com 2019 - 2024. All rights reserved.