wget 和 urllib 无法在 python 3 上从 url 加载文件

问题描述 投票:0回答:3

我正在尝试下载给定 url 和端口的 txt 文件。这适用于 python 2 这样做:

Python 2.7.12 (default, Sep 26 2016, 09:46:23)  [GCC 4.2.1 Compatible
Apple LLVM 7.3.0 (clang-703.0.31)] on darwin Type "help", "copyright",
"credits" or "license" for more information.
>>> import urllib
>>> foo = urllib.urlopen("http://catnet-ip.icc.cat:8080/")
>>> foo.read() 
'SOURCETABLE 200 OK\r\nServer: NTRIP Trimble NTRIP Caster\r\nContent-Type: text/plain\r\nContent-Length: 2884\r\nDate:
02/Nov/2016:12:52:19 UTC\r\n\r\nSTR;VRS_RTK_2_3;Virtual RTK ver RTCM
2.3;RTCM 2.3;1(1),3(6),18(1),19(1),23(5),24(5);2;GPS;Catnet;ESP;41.3;2.09;1;1;Trimble
GPSNet;None;B;N;3900;;\r\nSTR;VRS_RTK_3_0;Virtual RTK ver RTCM
3.0;RTCM 3;1004(1),1005/1007(5),PBS(10);2;GPS;Catnet;ESP;41.3;2.09;1;1;Trimble
GPSNet;None;B;N;1100;;\r\nSTR;VRS_DGPS;Virtual DGPS ver RTCM 2.3;RTCM
2.3;1(1),3(6),22(6),23/24(5),16(59);0;GPS;Catnet;ESP;41.3;2.09;1;1;Trimble
GPSNet;None;B;N;640;;\r\n 
...

与 wget 类似:

Python 2.7.12 (default, Sep 26 2016, 09:46:23)  [GCC 4.2.1 Compatible
Apple LLVM 7.3.0 (clang-703.0.31)] on darwin Type "help", "copyright",
"credits" or "license" for more information.
>>> import wget
>>> foo = wget.download("http://catnet-ip.icc.cat:8080/", bar=None)
>>> foo
>>> ' (1).'
>>> exit()
$ less \ \(1\).
SOURCETABLE 200 OK\r\nServer: NTRIP Trimble NTRIP Caster\r\nContent-Type: text/plain\r\nContent-Length: 2884\r\nDate:
02/Nov/2016:12:52:19 UTC\r\n\r\nSTR;VRS_RTK_2_3;Virtual RTK ver RTCM
2.3;RTCM 2.3;1(1),3(6),18(1),19(1),23(5),24(5);2;GPS;Catnet;ESP;41.3;2.09;1;1;Trimble
GPSNet;None;B;N;3900;;\r\nSTR;VRS_RTK_3_0;Virtual RTK ver RTCM
3.0;RTCM 3;1004(1),1005/1007(5),PBS(10);2;GPS;Catnet;ESP;41.3;2.09;1;1;Trimble
GPSNet;None;B;N;1100;;\r\nSTR;VRS_DGPS;Virtual DGPS ver RTCM 2.3;RTCM
2.3;1(1),3(6),22(6),23/24(5),16(59);0;GPS;Catnet;ESP;41.3;2.09;1;1;Trimble
GPSNet;None;B;N;640;;\r\n 
...

但是两者都在 python 3 上失败,并出现错误“http.client.BadStatusLine: SOURCETABLE 200 OK”

Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 26 2016, 10:47:25)  [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.request
>>> foo = urllib.request.urlopen("http://catnet-ip.icc.cat:8080/") 
Traceback (most recent call last):   
File "<stdin>", line 1, in <module>       
File /Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 163, in urlopen
    return opener.open(url, data, timeout)   
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 466, in open
    response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 484, in _open
    '_open', req)   File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 444, in _call_chain
    result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 1282, in http_open
    return self.do_open(http.client.HTTPConnection, req)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 1257, in do_open
    r = h.getresponse()
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 1197, in getresponse
    response.begin()
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 279, in _read_status
    raise BadStatusLine(line) 
http.client.BadStatusLine: SOURCETABLE 200 OK

和:

Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 26 2016, 10:47:25)  [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information.
>>> import wget
>>> wget.download("http://catnet-ip.icc.cat:8080/")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/toni/Downloads/wget-2.0/wget.py", line 308, in download
    (tmpfile, headers) = urllib.urlretrieve(url, tmpfile, callback)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 188, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 163, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 466, in open
    response = self._open(req, data)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 484, in _open
    '_open', req)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 444, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 1282, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 1257, in do_open
    r = h.getresponse()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 1197, in getresponse
    response.begin()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 279, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: SOURCETABLE 200 OK

来自http协议上的python文档我想这是由于urllib和wget理解我想要作为一些http代码加载的文件的第一个位置中的标签“SOURCETABLE”。该标签始终存在于我要下载的文件中(ntripcasters),但我找不到解决该问题的方法。

python-2.7 python-3.x gps wget urllib
3个回答
1
投票

我在使用不同的 NTRIP 服务器时遇到了同样的问题。根据 RFC 2616,

SOURCETABLE 200 OK
不是有效的 HTTP 状态代码。唉。我的解决方法:
curl
,特别是
pycurl

例如:

import sys
import pycurl

def handle_write(buf):
    sys.stdout.write(buf.decode("iso-8859-1"))

host = "http://catnet-ip.icc.cat:8080/"
curl = pycurl.Curl()
curl.setopt(pycurl.URL, host)
curl.setopt(pycurl.TIMEOUT, 20)
curl.setopt(pycurl.CONNECTTIMEOUT, 3)
curl.setopt(pycurl.HEADERFUNCTION, sys.stdout.write)
curl.setopt(pycurl.WRITEFUNCTION, handle_write)

curl.perform()
curl.close()

结果:

SOURCETABLE 200 OK
Server: GNSS Spider 7.4.0.8125/1.0
Date: Wed, 18 Dec 2019 21:29:11 GMT Standard Time
Content-Type: text/plain
Content-Length: 2667

STR;VRS_RTK_3_0;VRS_RTK_3_0;RTCM 
...
ENDSOURCETABLE 

1
投票
  1. 通过urllib3
    import urllib3
    http = urllib3.PoolManager()
    foo = http.request('GET',"http://example.com:2101/")
    print(foo.data) 
  1. 通过urllib 按照下一个方法编辑文件并尝试这个
import urllib.request
f = urllib.request.urlopen('http://splcare.in:2101')
print(f.read())
  1. 通过http模块 您可以修改python http库以添加额外的状态代码,您需要在
    {python programm folder}/lib/http/client.py
  2. 中编辑文件

找到包含代码的行

if not version.startswith("HTTP/") :
    self._close_conn()
    raise BadStatusLine(line) 

修改为

if not (version.startswith("HTTP/") or version.startswith("SOURCETABLE")):
    self._close_conn()
    raise BadStatusLine(line)

以下几行的另一个修改

elif version.startswith("HTTP/1."):
        self.version = 11   # use HTTP/1.1 code for HTTP/1.x where x>=1
    else:
        raise UnknownProtocol(version) 

修改为

elif version.startswith("HTTP/1."):
        self.version = 11   # use HTTP/1.1 code for HTTP/1.x where x>=1
    elif  version.startswith("SOURCETABLE"):
        self.version = 12   # Use NTRIP  
    else:
        raise UnknownProtocol(version)

现在您可以使用以下代码在 python 控制台上获取源表,但您需要编辑更多文件以支持 NTRIP。

import requests
url = "http://example.com:2101"
username = "xxxx"
password = "xxxx"
response = requests.get(url, auth=(username, password))
print(response.status_code)
print(response.content)

0
投票

另一种方法是使用标准库中的原始

socket

with socket.create_connection(("catnet-ip.icc.cat", 2101)) as sock:
    req_payload = b"GET / HTTP/1.1\r\nHost: catnet-ip.icc.cat\r\n\r\n"
    sock.sendall(req_payload)

    resp_payload = sock.recv(4096)
    while data := sock.recv(4096):
        resp_payload += data

现在你可以自己解析

resp_payload
,或者让
http.client.HTTPResponse
这里提到的那样做,但需要做一些修改来克服
SOURCETABLE
状态行:

SOURCETABLE = b"SOURCETABLE"

http_payload = (
    resp_payload.replace(SOURCETABLE, b"HTTP/1.1".ljust(len(SOURCETABLE)), 1)
    if resp_payload.startswith(SOURCETABLE)
    else resp_payload
)
© www.soinside.com 2019 - 2024. All rights reserved.