从 URL 获取 HTTP 响应代码的最佳方法是什么?

问题描述 投票:0回答:8

我正在寻找一种从 URL 获取 HTTP 响应代码(即 200、404 等)的快速方法。我不确定使用哪个库。

python
8个回答
131
投票

使用精彩的 requests 库进行更新。请注意,我们使用的是 HEAD 请求,它应该比完整的 GET 或 POST 请求发生得更快。

import requests
try:
    r = requests.head("https://stackoverflow.com")
    print(r.status_code)
    # prints the int of the status code*
except requests.ConnectionError:
    print("failed to connect")

*了解更多信息,请访问 https://developer.mozilla.org/en-US/docs/Web/HTTP/Status


65
投票

这里有一个使用

httplib
代替的解决方案。

import httplib

def get_status_code(host, path="/"):
    """ This function retreives the status code of a website by requesting
        HEAD data from the host. This means that it only requests the headers.
        If the host cannot be reached or something else goes wrong, it returns
        None instead.
    """
    try:
        conn = httplib.HTTPConnection(host)
        conn.request("HEAD", path)
        return conn.getresponse().status
    except StandardError:
        return None


print get_status_code("stackoverflow.com") # prints 200
print get_status_code("stackoverflow.com", "/nonexistant") # prints 404

26
投票

你应该使用 urllib2,像这样:

import urllib2
for url in ["http://entrian.com/", "http://entrian.com/does-not-exist/"]:
    try:
        connection = urllib2.urlopen(url)
        print connection.getcode()
        connection.close()
    except urllib2.HTTPError, e:
        print e.getcode()

# Prints:
# 200 [from the try block]
# 404 [from the except block]

9
投票

将来,对于那些使用 python3 及更高版本的人,这里有另一个代码来查找响应代码。

import urllib.request

def getResponseCode(url):
    conn = urllib.request.urlopen(url)
    return conn.getcode()

3
投票

urllib2.HTTPError
异常不包含
getcode()
方法。请改用
code
属性。


2
投票

解决 @Niklas R 的评论@nickanor 的回答

from urllib.error import HTTPError
import urllib.request

def getResponseCode(url):
    try:
        conn = urllib.request.urlopen(url)
        return conn.getcode()
    except HTTPError as e:
        return e.code

1
投票

这取决于多个工厂,但请尝试测试这些方法:

import requests

def url_code_status(url):
    try:
        response = requests.head(url, allow_redirects=False)
        return response.status_code
    except Exception as e:
        print(f'[ERROR]: {e}')

或:

import http.client as httplib
import urllib.parse

def url_code_status(url):
    try:
        protocol, host, path, query, fragment = urllib.parse.urlsplit(url)
        if protocol == "http":
            conntype = httplib.HTTPConnection
        elif protocol == "https":
            conntype = httplib.HTTPSConnection
        else:
            raise ValueError("unsupported protocol: " + protocol)
        conn = conntype(host)
        conn.request("HEAD", path)
        resp = conn.getresponse()
        conn.close()
        return resp.status
    except Exception as e:
        print(f'[ERROR]: {e}')

100 个 URL 的基准测试结果:

  • 第一种方法:20.90
  • 第二种方法:23.15

0
投票

这是一个

httplib
解决方案,其行为类似于 urllib2。你只需给它一个 URL,它就可以工作了。无需将 URL 拆分为主机名和路径。这个功能已经做到了。

import httplib
import socket
def get_link_status(url):
  """
    Gets the HTTP status of the url or returns an error associated with it.  Always returns a string.
  """
  https=False
  url=re.sub(r'(.*)#.*$',r'\1',url)
  url=url.split('/',3)
  if len(url) > 3:
    path='/'+url[3]
  else:
    path='/'
  if url[0] == 'http:':
    port=80
  elif url[0] == 'https:':
    port=443
    https=True
  if ':' in url[2]:
    host=url[2].split(':')[0]
    port=url[2].split(':')[1]
  else:
    host=url[2]
  try:
    headers={'User-Agent':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0',
             'Host':host
             }
    if https:
      conn=httplib.HTTPSConnection(host=host,port=port,timeout=10)
    else:
      conn=httplib.HTTPConnection(host=host,port=port,timeout=10)
    conn.request(method="HEAD",url=path,headers=headers)
    response=str(conn.getresponse().status)
    conn.close()
  except socket.gaierror,e:
    response="Socket Error (%d): %s" % (e[0],e[1])
  except StandardError,e:
    if hasattr(e,'getcode') and len(e.getcode()) > 0:
      response=str(e.getcode())
    if hasattr(e, 'message') and len(e.message) > 0:
      response=str(e.message)
    elif hasattr(e, 'msg') and len(e.msg) > 0:
      response=str(e.msg)
    elif type('') == type(e):
      response=e
    else:
      response="Exception occurred without a good error message.  Manually check the URL to see the status.  If it is believed this URL is 100% good then file a issue for a potential bug."
  return response
© www.soinside.com 2019 - 2024. All rights reserved.