抓取网站cloudfare突然抛出错误403,但重新启动抓取器它又可以工作了

问题描述 投票:0回答:1
  1. 我正在使用参数(area 和 isLet)废弃站点数据
  2. 网站cloudfare会突然抛出403错误,特别是当isLet改变时。
  3. 奇怪的是,当我重新启动我的 scrapper 程序控制台时,它立即再次工作。
  4. 遇到错误时,邮递员也可以工作。
private static void ScrapPropertyGuru(string area, short areaId, bool isLet)
        {
            bool isStop = false;
            int page = 1;
            CookieContainer cookie = new CookieContainer();
            cookie.SetCookies(new Uri("https://api.propertyguru.com"),
                "__cf_bm=MIuH1Q2WO3a662hCCSO5lB02DJv57UW7bzO28UhYNrI-1709981654-1.0.1.1-9_mTLz0ch.u07hsHOXhR3IwXP5HdHTHL8RrgdygYTsIKYXJLKp30ZUivIl3Un9x_mf2bO3wYmOQBifwXAa23Sg");
            do
            {
                try
                {
                    var sw = new Stopwatch();
                    sw.Start();
                    var limit = 20;
                    var url = isLet 
                        ?
                        "&limit="+ limit + "&status_code=ACT&region=my&locale=en&access_token=00409KSSGORVv6fth6OqNKvR44hIwlG3Qeyk6~h-dvsAh3AqgK-y5yWpJ23VK0zM36zZf1r~hZAgivDmBGBK06zF0SVsf9tbJTLMathkDjTBzWgdbnQdoVGJ9PurZ~0HS~yWctxybE1VDGscY6RKf5rmk7RObyyvm2MLdyMI4qyt1xj150TIdtKI~3vGJQflnWaSEbxqps-SbiVZpwEZg0Q1J2jWVt9ebP3xsMH1m6TzNJ28z9Q9fr0VApPj~Wo6Ji&region_code=" + area + "&_floorAreaUnits=sqft&_landAreaUnits=sqft&_floorLengthUnits=ft&_landLengthUnits=ft&include_featured=true&listing_type=rent&market=residential&check_spotlight=true&_includePhotos=true&check_premium=true&minbath=0&minbed=0&isPoiSearch=false" :
                        "&limit=" + limit + "&status_code=ACT&region=my&locale=en&access_token=00409KSSGORVv6fth6OqNKvR44hIwlG3Qeyk6~h-dvsAh3AqgK-y5yWpJ23VK0zM36zZf1r~hZAgivDmBGBK06zF0SVsf9tbJTLMathkDjTBzWgdbnQdoVGJ9PurZ~0HS~yWctxybE1VDGscY6RKf5rmk7RObyyvm2MLdyMI4qyt1xj150TIdtKI~3vGJQflnWaSEbxqps-SbiVZpwEZg0Q1J2jWVt9ebP3xsMH1m6TzNJ28z9Q9fr0VApPj~Wo6Ji&region_code=" + area + "&_floorAreaUnits=sqft&_landAreaUnits=sqft&_floorLengthUnits=ft&_landLengthUnits=ft&include_featured=true&listing_type=sale&market=residential&check_spotlight=true&_includePhotos=true&check_premium=true&isPoiSearch=false";
                    var request = (HttpWebRequest)WebRequest.Create(
                        "https://api.propertyguru.com/v1/listings?page=" + page + url);
                    SetRequest(request, "malaysia; consumer; iOS; 2024.2.29; null",false,true);
                    
                    //request.CookieContainer = cookie;
                    //request.Proxy = new WebProxy("183.89.59.91", 8080);
                   
                    var response = request.GetResponse();

                    var dataStream = response.GetResponseStream();

                    var reader = new StreamReader(dataStream);

                    var rt = reader.ReadToEnd();
                    var jsonResponse = JsonConvert.DeserializeObject<PropertyGuruJsonModel>(rt);
                    Log(rt);
                    reader.Close();
                    response.Close();

                    //
                    var userRepo =
                        new UserRepository(
                            "Data Source=LOCALHOST\\SQLEXPRESS;Initial Catalog=Tracker;Integrated Security=True;Pooling=True;Min Pool Size=20;Max Pool Size=2000;Connect Timeout=12000;Application Name=Tracker");
                    userRepo.InsertPhoneNumberByBulk(jsonResponse.listings.Where(a=>a.agent.mobile != null).GroupBy(a=> a.agent.mobile).Select(a=> new Tuple<string,string,short,string>(a.Key.ToString(),$"Hi, {a.First().localizedTitle} {a.First().price.pretty} still available? Thanks.",areaId,a.First().agent.name)).ToList(),
                        out var successNumberList);
                    foreach (var number in successNumberList)
                    {
                        var uniqueInfo = jsonResponse.listings.FirstOrDefault(a => a.agent.mobile == number);
                        if (uniqueInfo != null)
                        {
                            var recipientList = subscriberList.Where(a => a.Item1 == areaId);
                            //
                            string message =
                                $"https://api.whatsapp.com/send?phone={uniqueInfo.agent.mobile}&text=Hi {uniqueInfo.agent.name}, {uniqueInfo.localizedTitle} {uniqueInfo.price.pretty} still available? Thanks.";

                            foreach (var recipient in recipientList)
                            {
                                //SendAlertTelegram(Uri.EscapeUriString(message), recipient.Item2);
                            }
                        }
                    }
                    var logType = isLet ? "rent" : "sale";
                    Log(
                        $"PropertyGuru Area:[{area}] [{logType}] Processed Page: {page}.Total: {jsonResponse.total} | Time taken {sw.ElapsedMilliseconds / 1000} seconds. Total phone added: {successNumberList.Count}");

                    //

                    page += 1;
                    if (page > jsonResponse.total / limit || page == 1000)
                    {
                        isStop = true;
                    }

                    //Random rnd = new Random();
                    //int num = rnd.Next(3, 10);
                    Thread.Sleep(1000);
                }
                catch (Exception ex)
                {
                    Log($"{ex.ToString()} Area: {area} Page: {page}", true);
                }
            } while (!isStop);
        }
  1. 更改为无cookie请求
  2. 尝试将页面尺寸更改为不同的尺寸
c# api web-scraping postman
1个回答
0
投票

Cloudflare 广泛用于其安全服务和 DDoS 防护,其功能之一是识别和阻止它认为可能有害或不需要的自动流量。

当您的抓取工具更改 isLet 参数并以 Cloudflare 识别为非人类的方式与网站交互时(例如,在短时间内发出许多请求,以典型用户不会的方式更改参数),这可能会触发Cloudflare 的安全规则,导致 403 Forbidden 错误。但是,当您重新启动抓取工具时,它可能会暂时绕过 Cloudflare 的检测机制,使其再次工作一段时间。

© www.soinside.com 2019 - 2024. All rights reserved.