Python requests..compat.urljoin - 当URL不包含访问方法时,前置访问方法

问题描述 投票:0回答:2

以下是我尝试将https://添加到URL的两种方法。由于某种原因,urljoin方法给出了奇怪的输出:

from requests.compat import urljoin

host = 'abc.def.com'
host2 = host

# brute-force string method
if not host.startswith('https://'):
    host = 'https://' + host  # Add schema
if host.endswith('/'):
    host = host[:-1]          # Strip /
print('Stringy way', host)

# nice library method? Doesn't quite work
print('urljoin    ', urljoin('https://', host2))

我看到的输出,奇怪的三个///字符,是

Stringy way https://abc.def.com
urljoin     https:///abc.def.com

我也得到了其他变种的无效结果:

print('urljoin #2 ', urljoin('https:/', host2))
print('urljoin #3 ', urljoin('https:', host2))
print('urljoin #4 ', urljoin('https', host2))

得到:

urljoin #2  https:///abc.def.com
urljoin #3  https:///abc.def.com
urljoin #4  abc.def.com

这是错误的功能吗?

python https python-requests
2个回答
1
投票

您可以使用urllib.parse.urlunsplit()撰写网址:

from urllib.parse import urlunsplit

print(urlunsplit(("https", "abc.def.com", "", "", "")))

结果:

https://abc.def.com

它需要一个元组作为输入,匹配urlsplit()的输出,以及元组的以下属性:

enter image description here


0
投票

urljoin函数通常用于将href锚附加到现有url。例:

from requests.compat import urljoin
url = 'https://abc.def.com'
href = '364'
urljoin(url, href)

我得到输出: -

'https://abc.def.com/364'

但是,如果我想用'https://'填写我的网址,我宁愿使用: -

from requests.compat import urljoin
url = 'abc.def.com'
host = ('https://'+ url)
print(host)

我的输出是:

https://abc.def.com

我希望这有帮助。

© www.soinside.com 2019 - 2024. All rights reserved.