当站点检测到异常流量时，如何继续抓取数据？

Question

我试图制作一个网络爬虫是为了好玩，但我遇到了一个两难的境地。我找到了一个中文网站，其中包含一些有趣的数据，并决定制作一个机器人：1) 请求网站的 url，2) 逐步调整该 URL 以转到不同的页面，3) 搜索 DOM 并找到保留缩略图，以及 5) 将缩略图保存到我的桌面。我制作了两个相互嵌套的 if else 语句，它们会捕获网页的时间：不存在或没有缩略图和因此没有图像所在的类。如果其中任何一种情况发生，它将打印缩略图未保存到我的计算机的页码并继续。

问题来了，在我运行代码一段时间后，它停止将图像保存到我的计算机，即使该位置有缩略图也是如此。我认为这可能是由于网站注意到我试图获取数据，因为我收到“检测到异常流量”警告。任何帮助将不胜感激！

from pickle import NONE
import requests
import urllib.request
from bs4 import BeautifulSoup


localfile = "C:/Users/MYNAME/Desktop/Chinese Games/"
url = "https://www.9game.cn/xiazai/" 


for x in range(1, 500):
        page = requests.get(url + str(x) + "/")         # Request url and iterate with x 
        soup = BeautifulSoup(page.content, 'lxml') 
        image = soup.find(class_="d-headgame-icon")     # Finds the HTML elements that holds the image 
        if image != None:                           
            result = image.find("img").attrs['src']     # Extracts the URL of the image from all other elements
            if result !="":
                title = image.find("img").attrs['alt']  # Extracts the name of the image from all other elements
                urllib.request.urlretrieve(result, localfile + title + ".jpg")  # Saves images to Desktop as a JPG file
            else:
                print(x)    # Prints the page number if there is no image
        else:
            print(x)        # Prints the page number if there is no image

当站点检测到异常流量时，如何继续抓取数据？

问题描述投票：0回答：0

最新问题

当站点检测到异常流量时，如何继续抓取数据？

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0