访问特定 URL 时,Python 请求和 urllib 库出现 ConnectionResetError

问题描述 投票:0回答:1

尝试使用 Python requests 和 urllib 库访问特定 URL 时遇到 ConnectionResetError。尽管提供了适当的标头,连接仍被远程主机强制关闭。此问题持续出现,我正在寻求对其原因和潜在解决方案的深入了解。

这是我正在使用的代码片段:

import requests

headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
    'Accept-Language': 'en-US,en;q=0.9',
    'Cache-Control': 'no-cache',
    'Connection': 'keep-alive',
    'DNT': '1',
    'Pragma': 'no-cache',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36',
    'sec-ch-ua': '"Google Chrome";v="123", "Not:A-Brand";v="8", "Chromium";v="123"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"',
}

response = requests.get('https://newjersey.mylicense.com/verification/Search.aspx', headers=headers)

这是我收到的错误:

('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

正在使用的库:

  • 请求==2.31.0
  • urllib3==2.2.1

我尝试使用 requests 和 urllib 库访问 URL,提供必要的标头来模拟浏览器请求。我希望能够成功建立连接,以便我检索所需的内容。但是,我始终收到 ConnectionResetError,表明连接被远程主机强制关闭。

通过 Web 浏览器访问时,有问题的 URL 可以按预期运行,这表明问题可能出在 Python 库中,而不是服务器本身。

python web-scraping https python-requests urllib
1个回答
0
投票

这需要大量的故障排除才能解决。主要问题是服务器用于 TLS 连接的密码 (

'AES256-GCM-SHA384'
) 不是
ssl
包在建立安全连接时使用的默认密码之一。这会迫使连接在握手时失败,从而导致我们看到的错误。

诊断

使用curl可以工作,但它的信息量不是很大。

C:\>curl "https://newjersey.mylicense.com/verification/Search.aspx" -vv --head
*   Trying 208.95.153.120:443...
* Connected to newjersey.mylicense.com (208.95.153.120) port 443
* schannel: disabled automatic use of client certificate
* ALPN: curl offers http/1.1
* ALPN: server did not agree on a protocol. Uses default.
* using HTTP/1.x
> HEAD /verification/Search.aspx HTTP/1.1
> Host: newjersey.mylicense.com
> User-Agent: curl/8.4.0
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
...

Python 失败

尝试在 Python 中手动创建连接失败,并出现与您使用以下命令看到的相同错误:

import socket
import ssl

host = 'newjersey.mylicense.com'
context = ssl.create_default_context()

data = b"""HEAD /verification/Search.aspx HTTP/1.1
Host: newjersey.mylicense.com
User-Agent: python/3.11.8
Accept: */*

"""

with socket.create_connection((host, 443)) as sock:
    with context.wrap_socket(sock, server_hostname=host) as secure_sock:
        secure_sock.send(data)
        print(secure_sock.read().decode())

# raises:
File ~\envs\test\Lib\ssl.py:1379, in SSLSocket.do_handshake(self, block)
   1377     if timeout == 0.0 and block:
   1378         self.settimeout(None)
-> 1379     self._sslobj.do_handshake()
   1380 finally:
   1381     self.settimeout(timeout)

ConnectionResetError: [WinError 10054] An existing connection was forcibly
closed by the remote host

我转向使用

openssl
手动创建连接。这是我们最终找到所需信息的地方。 (这很冗长。)

C:\>openssl s_client -connect newjersey.mylicense.com:443

连接成功并打印以下信息(为了简洁起见,我删除了其中的一些信息):

CONNECTED(000001B4)
depth=2 C = US, ST = Arizona, L = Scottsdale, O = "GoDaddy.com, Inc.", ...
verify return:1
...
---
Certificate chain
 0 s:CN = *.mylicense.com
   i:C = US, ST = Arizona, L = Scottsdale, O = "GoDaddy.com, Inc.", OU = ...
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
   v:NotBefore: May 28 22:06:00 2023 GMT; NotAfter: Jun 28 07:22:12 2024 GMT
...
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIGkjCCBXqgAwIBAgIJAKhBrHwkidbVMA0GCSqGSIb3DQEBCwUAMIG0MQswCQYD
...
-----END CERTIFICATE-----
subject=CN = *.mylicense.com
issuer=...
---
No client certificate CA names sent
---
SSL handshake has read 4236 bytes and written 647 bytes
Verification: OK
---
New, TLSv1.2, Cipher is AES256-GCM-SHA384
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : AES256-GCM-SHA384
    Session-ID: ...
    Session-ID-ctx:
    Master-Key: ...
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1712847797
    Timeout   : 7200 (sec)
    Verify return code: 0 (ok)
    Extended master secret: yes
---

进口零件有:

SSL handshake has read 4236 bytes and written 647 bytes
TLSv1.2, Cipher is AES256-GCM-SHA384
。这里握手成功了,它告诉我们它使用的 TLS 版本和密码。
requests
默认使用TLS 1.2,所以是一样的。只剩下尝试不同的密码了。

其实只是在之前的Python代码中添加一行:

import socket
import ssl

host = 'newjersey.mylicense.com'
context = ssl.create_default_context()
context.set_ciphers('AES256-GCM-SHA384')

data = b"""HEAD /verification/Search.aspx HTTP/1.1
Host: newjersey.mylicense.com
User-Agent: python/3.11.8
Accept: */*

"""

with socket.create_connection((host, 443)) as sock:
    with context.wrap_socket(sock, server_hostname=host) as secure_sock:
        secure_sock.send(data)
        print(secure_sock.read().decode())

最后我们得到了预期的输出:

HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 43543
Content-Type: text/html; charset=utf-8
Expires: -1
Server: Microsoft-IIS/8.5
Set-Cookie: ASP.NET_SessionId=iejmoxg
© www.soinside.com 2019 - 2024. All rights reserved.