urllib.request.urlopen(url) 带身份验证

问题描述 投票:0回答:5

这几天我一直在玩漂亮的汤和解析网页。我一直在使用一行代码,它在我编写的所有脚本中都是我的救星。代码行是:

r = requests.get('some_url', auth=('my_username', 'my_password')).

但是...

我想用(打开带有身份验证的 URL)做同样的事情:

(1) sauce = urllib.request.urlopen(url).read() (1)
(2) soup = bs.BeautifulSoup(sauce,"html.parser") (2)

我无法打开网址并读取需要身份验证的网页。 我如何实现这样的目标:

  (3) sauce = urllib.request.urlopen(url, auth=(username, password)).read() (3) 
instead of (1)
python python-3.x url beautifulsoup request
5个回答
32
投票

您正在使用

HTTP Basic Authentication
:

import urllib2, base64

request = urllib2.Request(url)
base64string = base64.b64encode('%s:%s' % (username, password))
request.add_header("Authorization", "Basic %s" % base64string)   
result = urllib2.urlopen(request)

因此,您应该

base64
对用户名和密码进行编码,并将其作为
Authorization
标头发送。


24
投票

查看官方文档中的如何使用 urllib 包获取互联网资源

# create a password manager
password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()

# Add the username and password.
# If we knew the realm, we could use it instead of None.
top_level_url = "http://example.com/foo/"
password_mgr.add_password(None, top_level_url, username, password)

handler = urllib.request.HTTPBasicAuthHandler(password_mgr)

# create "opener" (OpenerDirector instance)
opener = urllib.request.build_opener(handler)

# use the opener to fetch a URL
opener.open(a_url)

# Install the opener.
# Now all calls to urllib.request.urlopen use our opener.
urllib.request.install_opener(opener)

6
投票

urllib3:

import urllib3

http = urllib3.PoolManager()
myHeaders = urllib3.util.make_headers(basic_auth='my_username:my_password')
http.request('GET', 'http://example.org', headers=myHeaders)

1
投票

用这个。这是 Python3 安装时发现的标准 urllib。效果很好有保证。另请参阅要点

import urllib.request

url = 'http://192.168.0.1/'

auth_user="username"
auth_passwd="^&%$$%^"

passman = urllib.request.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, auth_user, auth_passwd)
authhandler = urllib.request.HTTPBasicAuthHandler(passman)
opener = urllib.request.build_opener(authhandler)
urllib.request.install_opener(opener)

res = urllib.request.urlopen(url)
res_body = res.read()
print(res_body.decode('utf-8'))

0
投票

python3
中使用
urllib
,您可以使用以下代码

# Using urllib as this is a built-in tool and there is no need to install any third-party lib
import urllib
import urllib.request

auth = (
    "my_username",
    "my_password",
)

"""
This function needs to be called only once
Once the opener is installed subsequent call will use the same authentication
"""


def install_authenticated_request_opener():
    password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
    password_mgr.add_password(None, project_artifactory_url, auth[0], auth[1])

    handler = urllib.request.HTTPBasicAuthHandler(password_mgr)

    # create "opener" (OpenerDirector instance)
    opener = urllib.request.build_opener(handler)

    # Install the opener.
    # Now all calls to urllib.request.urlopen use our opener.
    urllib.request.install_opener(opener)


def download_html(base_url):
    html_request = urllib.request.Request(some_url)
    try:
        result = urllib.request.urlopen(html_request)
    except urllib.error.URLError as e:
        # handling error as that is important 😉
        print(
            "Network call failed. Error code:",
            e.code or "no HTTP status code",
            "reason:",
            e.reason or "missing reason",
        )
        # uncomment to raise this exception
        # raise e
    else:
        # Everything is fine
        htmlText = result.read()
        parsed_html = BeautifulSoup(htmlText, "html.parser")
        # Do something with the parsed HTML

如果您使用

requests
库,那么您可以使用下面的代码

import requests

auth = (
    "my_username",
    "my_password",
)

def download_html(some_url):
    resp = req.get(some_url, auth=auth)
    if resp.status_code != 200:
        print(
            "Network call failed. Error code:",
            resp.status_code or "no HTTP status code",
        )
        # uncomment to raise this exception
        # raise e
    else:
        # Everything is fine
        htmlText = result.read()
        parsed_html = BeautifulSoup(htmlText, "html.parser")
        # Do something with the parsed HTML

如果您使用

python2
作为
urllib
解决方案,则必须使用此 SO

您还可以阅读此 SO,了解

requests
lib 相对于
urllib

的优势
© www.soinside.com 2019 - 2024. All rights reserved.