我编写了此代码来获取网站内容,但有一个问题 当连接被提交时,程序停止并且不要尝试重新连接
url= 'https://website.com'
def get_page_content(url, head):
"""
Function to get the page content
"""
req = Request(url, headers=head)
return urlopen(req)
head = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive',
'refere': 'https://example.com',
'cookie': """your cookie value ( you can get that from your web page) """
}
data = get_page_content(url, head).read()
我在互联网上搜索但没有找到解决方案。
当请求失败时,您可以使用状态来重试。
get_page_content(url, head)
具有属性 status
。如果状态为 200、201 或 204,则一切正常。在任何其他情况下你都应该处理它。
Python 有 try, except 块,你可以添加。基本上,您将失败的代码(特别是引发错误的行)放入 try 部分,然后 except 部分将包含您想要在发生错误时执行的代码。您可以在循环中将其与标志一起使用,以继续尝试连接一定次数
success_flag = False
while success_flag == False: # keep looping until success
try: # code in here will attempt to run
req = Request(url, headers=head)
page = urlopen(req)
success_flag = True # mark as successful run if got through the code that may fail
return page
except Exception as e: # this code will run in the event of a failure
print(e) # just print the exception before trying again.
# possibly change something before reconnecting depending on error thrown
请记住,您没有发布回溯,因此代码本身可能存在问题。这段代码将按照您的要求进行操作,然后再试一次。在尝试重新连接之前,您可能需要更改一些内容